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doctors predict patient response to various HIV treatments with over 7896 accuracy — outperforming 9 out of 10 


| human experts in а recent study. The tool is built on an IBM analytics solution that integrates а variety of disparate 
‚ databases onto a flexible IBM DB2* platform to process complex metadata more effectively than anything else 
1. зоп the market. A smarter organization is built on smarter software, systems and services. 
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2008 New Features (Osborne/McGraw-Hill). 


January 201 I 


E this year, I attended a few IT events that 
really underscored the need for continuing edu- 
cation for DBAs, developers, and IT professionals. 
First, at the Professional Developers Conference 
(PDC) in Redmond, Microsoft CEO Steve Ballmer 
declared that Microsoft is "all in" for Windows 
Phone 7 development, as well as being “all in? for 
its cloud-based Windows Azure and SQL Azure 
platforms. (Of course, how can you be “all in” for 
two completely separate technologies? I don't know, 
but it is Microsoft we're talking about.) Then, 
at DevConnections in Las Vegas, speakers gave 
deep-dive sessions into the internals of the critical 
Microsoft technologies that organizations currently 
have deployed, including Windows Server, Microsoft 
Exchange Server, SQL Server, and SharePoint. Right 
on the heels of that event came the Professional 
Association for SQL Server (PASS) Summit 2010 in 
Seattle, where Microsoft and other industry experts 
dove into the new features of SQL Server 2008 R2, 
as well as providing a look ahead at the next release 
of SQL Server (code-named "Denali”). 

I've been working with Windows development 
and SQL Server since 1978, and it's obvious that 
no area of business changes faster than technology. 
Microsoft's presentations at PDC left no doubt that 
the company’s focus is on mobility and cloud—both 
being areas in which most of today's businesses and 
their IT and database professionals aren't strong. 
Although it's not clear to me whether businesses in 
the future will really adopt cloud computing tech- 
nologies, it certainly is clear that IT and database 
professionals need to keep pace with these rapidly 
changing technologies. As a database professional, 
you need to know about technologies to be able 
to help your company's management team make 
informed decisions about their future strategies. One 
of the best ways to prepare for these technologies is 
to apply yourself to learning about them even if 
you don't think your organization is going to adopt 
them. 


Have More Control Over Your 
Environment 

Without a doubt, cost savings is the driving factor 
behind the recent push for cloud technologies. How- 
ever, one of the trade-offs is that database manage- 
ment tasks such as scalability and security move out 
of your company's control and into the hands of 
the cloud provider. Personally, I don't like the idea 
of sacrificing control to save costs, and I think the 
last mile between the Internet and your business will 


ҒА Continuing Education 


always be a weak link. I think it's worth paying more 
to have more control and better service. That said, 
not all businesses think of their applications this 
way and many can get by with occasional outages, 
which are going to happen no matter which SLAs 
are in place. And even in organizations that are cur- 
rently leaning against the web, this all might change 
as the technology improves or as management 
within the company changes. As a database profes- 
sional, you don't want to be left on the wrong side 
of that. That's where continuing education plays an 
important role in keeping up with future technology 
trends such as mobility and cloud computing. 

Mobility and cloud aside, it's clear from events 
such as DevConnections and PASS that SQL Server 
itself is a rapidly changing technology. Back in the 
SQL Server 6.5 days, it was possible to be an expert 
in all aspects of SQL Server. SQL Server was a 
smaller product and you were able to get your hands 
around the whole of it. That's certainly not the case 
anymore. Through the years and multiple releases, 
SQL Server has grown into a much bigger, more 
complex, and definitely enterprise-capable product. 
The sessions at DevConnections and PASS cer- 
tainly brought this point home. Although I've been 
working with the product for years and know a lot 
in many areas, I know there are always important 
details you can learn from experts who specialize 
in the areas you don't frequent that often but are 
still useful to your business. Conferences such as 
DevConnections and PASS are great opportunities 
to explore the areas that you need to learn more 
about. 


Technology-Proof Your Job 
Continuing education is more important now than 
it has been at any point in the past. The products 
are bigger and more complex, requiring you to 
deal with areas that you might not be an expert 
in. In addition, although nothing's for certain, it's 
possible that the IT industry is on the verge of a 
technology shift toward cloud computing or per- 
haps a hybrid of cloud and on-premises computing. 
This could be disruptive in terms of the technol- 
ogy deployed, as well as cause a shift of jobs from 
on-premises servers to cloud-based technologies. 
Making a commitment to continued education 
by attending industry conferences, training, and 
related web events are some of the best ways to 
technology-proof your job and ensure your future 
value to your business. 500] 
InstantDoc ID 129000 
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UESTIONS 
NSWERED 


How Many Indexes Should 


| Create? 


ou've frequently discussed what a good 
clustering key should be, as well as how 
indexes work. However, you haven't told us 


how many indexes we should have on a table. What's 
a good number of indexes? 


This question is part of a prank that Paul and I pull 
on each other occasionally. When Paul's not around, 
ГІ tell students to ask him “If I have a database that's 
X terabytes, how long will it take for CHECKDB 
to run?" and when I'm not around, he tells students 
to ask me “How many indexes should I have on a 
table?" However, I do have to admit that both are 
actually great questions! The problem is that neither 
question is easy to answer. There are a tremendous 
number of issues to consider, and even those don't 
give you an exact answer. (To find out the answer to 
the CHECKDB question, see ^Determining How 
Long It Takes for DBCC CHECKDB to Run,” www 
.sqimag.com/blogs/sql-server-questions-answered/sql- 
server-questions-answered/tabid/1977/entryid/12971/ 
Default.aspx.) In this article, ГЇЇ respond to the 
indexing question; however, there's no easy answer 
to this question. 

Generally, I would recommend you create as few 
indexes as possible on any table. Over-indexing can 
be far worse than under-indexing because it leads to 
many problems, including data modification over- 
head, wasted disk space, and wasted cache. The worst 
type of over-indexing that I've seen is where an index 
is created on each column, without any regard for 
performance tuning. This is а horrible idea. In fact, 
very narrow indexes (i.e., indexes with only one or 
two columns) have very few uses in SQL Server. What 
І recommend are fewer overall indexes, but wider 
indexes because they have far more uses. 

However, under-indexing can be very problematic 
because it limits the options the optimizer has to pro- 
cess your query requests, and it can lead to inefficient 
table scans. If a table is small (only a few megabytes), a 
table scan probably isn't horrible, but for larger tables 
(tables in gigabytes) it can be a huge problem. I've seen 
a few cases in which the database had no indexing at 
all—no data integrity constraints and no manually 
created indexes. This scenario isn't good, either. 

It sounds like a fine line between too many indexes 
and too few, but it's not quite as fine of a line as you 
might think. There's definitely a wide range of what 
can be useful indexes. Additionally, how many indexes 
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are really necessary will depend on many factors, the 

most important of which is the table's usage pattern. 

The overall point, however, is how to create the right 

indexes. To start, Га say that most tables should have 

fewer than 15 indexes. In many cases, tables that 
focus on transaction processing (OLTP) might be in 
the single digits, whereas tables that are used more 
for decision support might be well into double digits. 

However, I'd also add that most cases in which I see 

40, 50, or more than 60 indexes are because of the 

following reasons: 

* [ndexes are added without any single person 
watching over what's being done (there are too 
many cooks in the kitchen). 

е Index recommendations are made by a variety 
of tools, but no one is analyzing what's there 
before they approve (and create) the recom- 
mended indexes. 


Paul S. Randal 


Раш $. Randal (paul @ SQLskills.com) 
and Kimberly L. Tripp (kimberly (0) 
SQLskills.com) are а husband-and-wife 


team who own and run SQLskills.com, a 


I recommend starting with а small number of 
indexes and slowly (and iteratively) adding indexes 
that are recommended. However, never add an index 
without checking your existing indexes to see if you 
can consolidate or remove an existing index first. This 
is а bit more challenging (and ГЇЇ write а few posts on 
this topic soon). Here's the step-by-step process I like 
to follow for creating indexes (this process is especially 
helpful for developers): 

1. Create an effective clustering key. To bet- 
ter understand the factors that make up а good 
clustering key, see "Changing the Definition of а 
Clustered Index” at www.sqlmag.com/blogs/sql- 
server-questions-answered/sql-server-questions- 
answered/tabid/1977/entryid/12749/Default.aspx, 
"Isn't the Clustering Key Redundant?” at www 
.sqlmag.com/blogs/sql-server-questions-answered/ 
sql-server-questions-answered/tabid/1977/ 
entryid/12852/Isnt-the-Clustering-Key-Redundant 
.aspx, “Where Does that Clustering Key Go?” 
at www.sqlmag.com/blogs/sql-server-questions- 
answered/sql-server-questions-answered/ 
tabid/1977/entryid/12837/Where-Does-the- 
Clustering-Key-Go.aspx, and “What Happens If 
I Drop A Clustered Index?” at www.sqlmag.com/ 
blogs/sql-server-questions-answered/sql-server- 
questions-answered/tabid/1977/entryid/12748/ 
Default.aspx. 

2. Create your primary key (if you decided 
that the primary key you want isn't suited to 
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being the clustering Кеу— Юг example, if it's a 
GUID). 

3. Create your unique keys. 

4. Add foreign keys when appropriate and 
manually create nonclustered indexes on your 
foreign keys. SQL Server has never automati- 
cally created indexes on foreign keys and these 
can help in several ways. They can help improve 
the performance of the relationship as well as 
some joins. There are often better indexes for 
many joins, but this is always a good way to 
start. 

5. Add a few nonclustered indexes for the 
columns that are very selective and used fre- 
quently in queries (e.g., LastName, FirstName, 


Middlelnitial). If you're doing a lot of customer 
lookups by name, then a single index that com- 
bines the columns in the order of what's most 
requested can be added. If the combination is 
highly selective, SQL Server can use this index to 
help navigate to the data faster. If the column or 
combination of columns isn't highly selective, the 
index isn't as likely to be used (when the query is 
requesting other columns of data). This is where 
things start to get more challenging, and ГЇЇ have 
to tackle these issues in another post. Do not add 
too many indexes. 

6. Slowly and iteratively add only the most 
important and critical indexes based on workload 
analysis. 


A Safe Method for Moving a 
Database to a New Location 


found that the previous owner installed all the 

databases on the C drive, including the 800GB 
production database, which has only a single data 
file. Га like to move this database to a new location 
and create some more file groups for it. Can you 
suggest a safe methodology for doing so? 


Ес just taken over a SQL Server system and 


You're absolutely right that having all the databases 
installed on the same drive, shared with the OS and 
the page file, is a recipe for poor performance, espe- 
cially for a database that's so large. There are a variety 
of methods for moving the database to a new loca- 
tion, but ГЇЇ explain what I think is the most efficient 
and safest method. 

First, you d like to create some new file groups and 
move some of the tables to them. You can do that 
by creating the file groups and using the CREATE 
INDEX WITH (DROP EXISTING=ON) syntax. 
This syntax rebuilds the index and lets you specify 
the new file group location of the index after it has 
been rebuilt, and you can even perform this operation 
online when using the Enterprise Edition. You can 
do this for the tables themselves (assuming they have 
clustered indexes) and the non-clustered indexes. 

Once all the user tables and indexes have been 
moved out of the primary file group (with its single 
data file), you can reclaim the empty space by per- 
forming a DBCC SHRINKFILE operation on the 
data file. The system tables should already be at the 
logical start of the data file, and given that all the user 
tables have already been moved to other file groups, 
theres no danger of causing index fragmentation 
on them because of the shrink operation. (For more 


information, see my blog post ^Why you should not 
shrink your data files” at www.sqlskills.com/blogs/ 
paul/post/why-you-should-not-shrink-your-data-files 
.ASPX.) 

Now you're ready to move the primary data file to 
a new location. The safest procedure for doing so isn't 
to detach/attach the database, and the most efficient 
procedure isn't to backup and restore the database. 
Instead, the data file can be moved very efficiently 
and safely, but with some downtime involved. The 
following is the method for doing so: 

1. Set the database offline using ALTER 
DATABASE paulsdb SET OFFLINE. 

2. Copy the data file to the new location. Do 
not move the data file because this means there 
would only be а single copy of the file. 

3. Copy the log file to the new location. 

4. Inform SQL Server of the new location 
of the primary data file and the log file using 
ALTER DATABASE to specify the new location 
of the primary data file. 

5. Set the database online using ALTER 
DATABASE paulsdb SET ONLINE. 


This mechanism preserves the file in the old 
location so that if anything goes wrong, you have а 
fall-back strategy. By performing the index moves 
before the data file move, you limit the amount of 
data copying being performed. I would also perform 
a full database backup before the operations (for easy 
disaster recovery if anything goes wrong) as well as 
after the operations so that any disaster recovery 
doesn't mean you have to perform the entire data 
migration again. 500] 
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QL Server is a very complex product. And 

when it comes to performance tuning, many 

DBAs simply don't know where to start. There 
are many facets to the program and many things to 
consider if you are to correctly tune a SQL Server 
instance. Performance tuning is definitely one of 
those areas in which experience is the best teacher. 
Well you need to start somewhere. And, as is common 
to many things, it all starts by having a good founda- 
tion to build upon. 

We are going to get back to the basics of perfor- 
mance tuning by learning 10 tips that will help you get 
off on the right foot when you try to identify existing 
performance issues and prevent future ones. 

Before we start, take a look at the “SQL Server 
2005 Waits and Queues" white paper on the SQL 
Server CAT (Customer Advisory Team) website 
(sqlcat.com/whitepapers/archive/2007/1 1/19/sql-server- 
2005-waits-and-queues.aspx). It contains example 
code and much more detail than I can provide here. I 
highly recommend that you read this white paper and 
use it as a reference for this article. 


Every time SQL Server tries to do something but gets 
held up for any reason, it tracks the incident in the 
form of something known as wait statistics. (See the 
CAT white paper for more information about wait 
statistics.) This is one of the areas of SQL Server that 
you must begin to understand to correctly determine 
the resources that SQL Server is waiting on at any 
given moment. For example, if you find that most 


of your waits are related to a page_IO_latch issue, 
you can be pretty sure that your bottleneck has to do 
with I/O. And if you see many LCK_XX type waits 
occur, you're seeing a blocking issue. In this case, you 
would spend your time more wisely by investigating 
the cause of the blockage instead of by looking at 
the I/O issues. 

The CAT white paper contains plenty of detail 
about the different wait types and code examples for 
capturing and deciphering the results. After you have 
read the white paper, examine your waits from time 
to time so that you can stop chasing your tail and 
start narrowing down where the real bottleneck lies. 
This is, without a doubt, where you should start your 
performance tuning day. 


ПО bottlenecks are one of the key reasons why perfor- 
mance suffers in SQL Server. You have three relatively 
easy methods at your disposal to determine whether 
you have I/O issues: 

e Check whether you see high page_IO_latch waits 
or log_write waits in your wait statistics. 

e Use the DMF sys.dm_io_virtual_file_stats() 
to locate any areas in which you have excessive 
physical I/O or excessive stalls on that I/O. These 
issues can occur at the database level or even at 
the file level. 

e Use the trusty PerfMon counters. At a mini- 
mum, use the Avg. Disk sec/Read and Avg. 
sec/Write counters to see the latency of the 
reads and writes. On an OLTP system, you 
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would, ideally, want to see log file latency 
to be just a few milliseconds (ms) and data 
file latency to be less than 10ms. Remember 
that these are ideal values. Your system may 
tolerate larger latency and still be fine. Also 
keep in mind that many times when you find 
that the storage subsystem cannot keep up 
with the current demand, the cause might not 
be an I/O bottleneck at all. It might, instead, 
be poorly tuned queries or a lack of proper 
indexes. 


When you find that you have many physical I/O 
bottlenecks occurring, your first instinct should be 
to find the queries that are causing all the physi- 
cal I/O, and then try to tune them before you add 
more hardware. One performance aspect that you 
should never ignore is high latency for log writes. 
If you start to hold up the writing to the log file, 
all further DML operations can quickly become 
impeded, and they'll remain so until you alleviate 
the bottleneck. High latency in log writes is a sure 
way to hinder performance in the database. For 
a better understanding of storage issues and I/O 
subsystems in a SQL Server environment, I recom- 
mend that you read the Microsoft articles “Storage 
Top 10 Practices" (technet.microsoft.com/en-us/ 
library/cc966534.aspx) and "SQL Server Best Prac- 
tices Article" (technet.microsoft.com/en-us/library/ 
cc966412.aspx). 


Tip 3: Root Out Problem 
Queries 

In any given SQL Server instance, there are prob- 
ably 8 to 10 queries or stored procedures that are 
responsible for 80 to 90 percent of the poorly tuned 
activity that you see throughout the day. Naturally, 
this isn't true for everyone, but it's true often enough 
to matter. If you can identify these problem queries, 
and if you can prioritize tuning them, you can make 
a significant impact on overall performance of your 
server. 

The CAT white paper contains information 
about this subject. But one way that you can easily 
identify expensive statements is by using the code in 
Listing 1. The sys.dm exec query stats DMV query 
contains aggregate performance statistics that are 
associated with each of the cached query plans in the 
SQL Server instance. This query easily enables you 
to order the results in several ways, depending on 
what type of resource usage we want to concentrate 
on. The key is to weigh the number of times that the 
statement was executed versus the average resource 
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usage to better determine which statements would 
have the largest impact on performance if they were 
optimized better. 

Icaution you not to put too much weight on the 
total elapsed time or on the overall duration of the 
statement because other factors, such as blocking, 
can influence the overall duration. But by using this 
query, you should be able to quickly identify the top 
offending statements in your system and prioritize the 
statements so that you can tune them as efficiently 
as possible. 


Tip 4: Plan to Reuse 
Today's database applications deal with an ever- 
increasing transaction rate as hardware gets more 
powerful and less expensive. For this reason, you 
are often trying to pack more and more into a single 
server, thus increasing the activity. But one of the 
most expensive operations in terms of memory and 
CPU usage in the SQL Server instance is the compil- 
ing of query plans. We have also seen an increase in 
applications that may speed development, but that 
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LISTING 1: The sys.dm exec query stats DMV Query 


for Performance Statistics 


БЕКЕ Г 
COALESCE(DB_NAME(t.[dbid]),'Unknown') AS [DB Name], 
ecp.objtype AS [Object Type], 
t.[text] AS [Adhoc Batch or Object Call], 
SUBSTRING(t.[text], (qs.[statement_start_offset]/2) + 1, 
((CASE qs.[statement_end_offset] 
WHEN -1 THEN DATALENGTH(t.[text]) ELSE qs.[statement_ 
end offset] END 
- qs.[statement start offset])/2) + 1) AS [Executed 
Statement] 
execution count] AS [Counts] 
total worker time] AS [Total Worker Time], (qs.[total 
worker time] / 
qs.[execution count]) AS [Avg Worker Time] 
, qs.[total physical reads] AS [Total Physical Reads], 
(qs.[total physical reads] / qs.[execution count]) AS [Avg Physical 
Reads] 
qs.[total logical writes] AS [Total Logical Writes], 
(qs. ГОЛ .logical writes] / qs.[execution count]) AS [Avg Logical 
Writes] 
qs.[total logical reads] AS [Total Logical Reads], 
(qs. [total .logical reads] / qs.[execution count]) AS [Avg Logical Reads] 
‚ qs.[total clr time] AS [Total CLR Time], (qs.[total clr time] / 
qs.[execution count]) AS [Avg CLR Time] 
, qs.[total elapsed time] AS [Total Elapsed Time], 
(qs.[total elapsed time] 
/ qs.[execution count]) AS [Avg Elapsed Time] 
, qs.[last execution time] AS [Last Exec Time], qs.[creation time] 
AS [Creation Time] 
FROM sys.dm exec query stats AS qs 
JOIN sys.dm exec cached plans ecp ON qs.plan handle = ecp.plan handle 
CROSS APPLY sys.dm exec sql text(qs.sql handle) AS t 
-- ORDER BY [Total Worker Time] DESC 
-- ORDER BY [Total Physical Reads] DESC 
-- ORDER BY [Total Logical Writes] DESC 
-- ORDER BY [Total Logical Reads] DESC 
-- ORDER BY [Total CLR Time] DESC 
-- ORDER BY [Total Elapsed Time] DESC 
ORDER BY [Counts] DESC 
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often result in calls to the database that make it nearly 
impossible to reuse the query plan. I'm talking about 
applications that do not use stored procedures or that 
do not correctly parameterize their queries. This can 
result in extremely poor performance, especially as 
the number of transactions increases. I highly recom- 
mend that you also read the “Plan Caching in SQL 
Server 2008” white paper at msdn.microsoft.com/ 
en-us/library/ee343986.aspx as an excellent source 
for describing how SQL Server handles query plans 
and reuse. 

The CAT white paper also goes into some detail 
about this subject. But the following code example is 
a quick and easy way to determine which statements 
are not reusing query plans: 


SELECT b.[cacheobjtype], b.[objtype], 

b.[usecounts], 
a.[dbid], a.[objectid], b.[size іп. 

bytes], a.[text] 

FROM sys.dm exec cached plans as b 

CROSS APPLY sys.dm exec sql text 
(b.[plan handle]) AS a 

ORDER BY [usecounts] DESC 


This query sorts all of the plans in the procedure 
cache in descending order of use counts. The use 
counts column is incremented every time a plan is 
reused, and it lets you easily identify which plans 
have reuse. You can also order the plans by the text 
column to determine which statements have many 
similar entries that have a use count of one. This 
value indicates statements that you call often but 
that do not reuse the existing plan. After you have 
identified these statements, you can prioritize which 
parts of the application you must work on first to 
get the biggest bang for your buck in terms of plan 
reuse. Don't underestimate how seriously a lack of 
plan reuse can affect performance as the transaction 
rate increases. 


Tip 5: Monitor Index Usage 

The sysdm db index operational stats) DMF is 
a widely underutilized source of information. It 
can provide valuable information about your index 
usage. By using this DMF, you can decipher all kinds 
of information to determine not only which indexes 
are used but also how they're used. For example, are 
you scanning the index or are you using seeks? This 
DMF will tell you. It will even tell you such things as 
the time elapsed for processes, such as latching and 
locking. Have you ever asked yourself, “15 this index 
being used?" or, "Which indexes am I using for a given 
table?" We've all asked these questions at one time or 
another. So you can use this DMF to get a handle on 
your Index usage. 


Tip 6: Separate Data and Log 
Files 

One of the most basic but often disregarded rules 
for good performance is to separate the data and the 
log files onto separate physical drive arrays when- 
ever possible. This is especially true when you use 
DAS, but it also applies to a SAN. The key principle 
here is to separate the mostly random access of the 
data files from the sequential access that occurs by 
writing to the transaction logs. One aspect familiar 
to a SAN environment is that even though you are 
presented with different drive letters or LUNs, you 
cannot be sure that these represent different physical 
drive arrays. Often, these apparent drives are carved 
from the same larger disk array, and this will defeat 
the intent of separating them in the first place. So 
make sure that you know what you're really getting 
when you ask for your storage on a SAN. You will be 
amazed at how much difference this can make as the 
volume of your transactions increases. 


Tip 7: Use Separate Staging 
Databases 

Isee far too many production databases that are also 
used for scratch space when it comes to importing or 
exporting data. People frequently import data into 
a table that 1s real but temporary in nature. Then, 
they proceed to manipulate the data by performing 
extensive updates to prepare it for one of the final 
production tables. The problem with this is that most 
production databases are in full recovery mode. This 
means that practically all activity is fully logged in 
the transaction log, and these operations can be 
expensive. 

By using a separate staging database within the 
SQL Server instance that is in simple recovery mode, 
you can achieve several performance benefits. One is 
that you can often get a minimally logged load instead 
of a fully logged load. This can dramatically speed up 
the import process to begin with. With a minimally 
logged load, the amount of data that's logged to the 
transaction log is very small in relation to what it 
would be for a fully logged load, so you have a much 
lighter burden on the server as a whole. 

If you were to perform these operations in the 
production database, all that activity would have 
to use resources such as I/O, CPU, and memory. 
АП the preparation would be fully logged in either 
database; but at least in the staging database, you 
would not have to back up the log. If this was done 
in the production database instead, that activity 
would increase the time required to back up the 
log. Therefore, the resulting backup files would be 
larger than necessary. Create a staging database, and 
perform your initial importing and manipulation in 
that database to reduce the overall burden on the 
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server and to minimize the impact on the production 
databases. 


Tip 8: Pay Attention to Log 
Files 

Too many people underestimate the importance of 
the transaction log file in relation to performance. The 
most common mistake people make is not to leave 
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Performance tuning is 
definitely one of those 
areas in which experience 
is the best teacher. 


enough free space in the transaction log file to allow 
for your normal operations to occur without forcing 
an auto-grow operation. Growing the log file can be 
extremely time-consuming. It can force апу DML 
operations to wait until the growth is complete before 
the operation can proceed. 

By making sure that you always have plenty of free 
space in the log file, you can avoid these performance 
hits altogether. The second most common mistake is 
to have far too many virtual log files (VLFs) in the 
transaction log. For an excellent explanation of this 
practice, see Kimberly L. Tripp's blog post “Тгапзас- 
tion Log VLFs - too many or too few?" (www.sqlskills 
.com/BLOGS/KIM BERLY/post/Transaction-Log- 
VLFs-too-many-or-too-few.aspx). Personally, I prefer 
to have the VLFs set at 512MB each. You can obtain 
this value by growing the log file in 8GB increments. 


Tip 9: Minimize tempdb 
Contention 

If your application makes heavy use of tempdb, it's 
possible that you could run into some contention 
regarding internal structures that are associated with 
the tempdb files. There are ways to minimize the con- 
tention, outlined in the TechNet topic “Working with 
tempdb in SQL Server 2005" at technet.microsoft 
.com/en-us/library/cc966545.aspx. 

This topic goes into great detail about many 
aspects of tempdb. However, I want to point out that 
the article recommends that you create one tempdb 
data file for each processor core. In many cases, this 
is excessive and can actually degrade performance. 
What I recommend is that you keep an eye on the 
page latch UP waits for tempdb, and that you 
increase the number of files until the wait either ends 
or is close to zero. Just remember to make all the 
files the exact same size; otherwise, you don't experi- 
ence any benefit because of the allocation algorithm 
that's based on the amount of free space in each file. 
Another thing to keep in mind is not to explicitly drop 


temporary tables in your code if you can help it. Let 
SQL Server do that on its own. In this way, you can 
maximize the effect of the caching enhancements that 
are available in recent versions of SQL Server. 


Tip 10: Change the MAX 
Memory Limit 

There have been improvements in the 64-bit versions 
of SQL Server regarding memory allocation and 
sharing with the OS and other applications, But I 
have yet to see where leaving the MAX memory set- 
ting at the default is ideal in real life. Even though 
your host server may be dedicated to SQL Server, 
there are always other applications or parts of the 
OS that require memory from time to time or even 
all the time. Do yourself a favor and set the MAX 
memory setting to least IGB to 2GB less than the 
total amount of memory on the server if this is a 
single instance. If you have multiple instances, or 
if you are in a multi-instance cluster, you also have 
to account for that. How much memory you leave 
depends on what else you have running and how 
much memory it requires to operate efficiently. But 
you can adjust this value up or down over time as 
you see fit. 


Bonus Tip: Just Say No to 
Shrinking Data Files 

OK, this makes 11 tips. But shrinking data files has 
been a general bad practice for a long time, and it can 
affect performance in one of three ways. The shrink- 
ing can be very painful to begin with. But because it 
can cause a lot of fragmentation, your subsequent 
queries may suffer as a result. And if you don't have 
Instant File Initialization turned on, the resultant 
growth later can also hinder performance and poten- 
tially cause time outs. Although there are times when 
shrinking a file may be necessary, make sure that you 
know the impact before you try it. For more details, 
see Paul Randal's blog post ^Why you should not 
shrink your data files” (www.sqlskills.com/BLOGS/ 
PAUL/post/Why-you-should-not-shrink-your-data- 
files.aspx). 


Make These Tips Your Own 
By following these guidelines and keeping this 
information in mind, you should be able to iden- 
tify the most common performance problems 
in SQL Server—and prevent or minimize future 
ones, as well. By now, you will have noticed that 
some of these topics will require more reading and 
some actual experience for you to fully grasp the 
concepts and techniques. But none of these topics 
are out of reach for the average DBA. And even 
a beginner has to start somewhere. So why not 
here? 500 
InstantDoc 10 128932 
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Pivoting 
Data 


Identify pivoting elements and put those 


elements into the code 


ivoting data is a technique that rotates data 
P- a state of rows to a state of columns, pos- 

sibly aggregating multiple source values into 
the same target row and column intersection. Some 
practical reasons to pivot data include formatting data 
for reporting purposes, calculating custom aggrega- 
tions, and solving relational division problems. We've 
covered pivoting in the past (see the Learning Path at 
www.sqlmag.com, InstantDoc ID 128846, for SOL 
Server Magazine articles about pivoting), but based on 
the number of pivoting questions I continue to receive, 
Ithink a refresher is in order. Even experienced T-SQL 
developers have a hard time getting used to the syntax 
of the native PIVOT and UNPIVOT operators that 
were introduced in SQL Server 2005; many developers 
resort to using more traditional syntax even though it 
tends to be much more verbose. 

In this article I discuss standard traditional tech- 
niques for pivoting, as well as nonstandard techniques 
that rely on the newer native PIVOT operator. I also 
explain the advantages and disadvantages of the 
PIVOT operator. Finally, I discuss common challenges 
associated with pivot queries and their solutions. 

For sample data I use a database called 
InsideTSQL2008. You can download the source code 
to create the sample data from www.InsideTSQL 
.com/books/source code/InsideTSQL2008.zip. 


Pivoting Phases and а Standard 
Solution 

My first example to demonstrate pivoting involves 
shipping information from the Sales.Orders table for 
orders shipped in Spain. The following query returns 
the source data that you'll rely on: 


USE InsideTSQL2008 ; 


SELECT shipperid, shipcity, freight 
FROM Sales.Orders 

WHERE shipcountry = N'Spain' 

ORDER BY shipperid, shipcity; 


Table 1 shows this query's output. 
Suppose that you get a task to return a row for 
each shipper ID, a column for each shipping city in 
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Spain, and the total freight values for each shipper 
and city intersection. Table 2 shows the desired result. 
This task represents a classic pivoting need and is 
therefore a good example to demonstrate the concept 
and solutions. 

If you want to break a pivoting task into multiple 
conceptual phases, you can identify three main phases: 
grouping, spreading, and aggregating. Web Figure 1 
illustrates these phases. 

From a user perspective, pivoting involves three 
main elements: the element that you need on rows 
(shipperid in our case), the element that you need on 
columns (shipcity in our case), and the element that 
you need in the data section, or the intersection of 
rows and columns (sum of freight in our case). 

The first phase illustrated in Web Figure 1 is the 
grouping phase. You need to arrange one row in the 
result for each distinct *on rows" element. In SQL 
terms, this is called grouping; the element that you 
need to group by (the on rows element) is shipperid. 
In the standard traditional solution to pivoting, the 
grouping phase is implemented simply with a 
GROUP BY clause, like so: 


GROUP BY shipperid 


The second phase in pivoting is the spreading 
phase. You need to spread the values that will later 
be aggregated (freight in our case) to target columns 
based on the distinct values that appear in the “оп 
cols" element (shipcity in our case). The distinct ship 
city values that appear in our sample data for the ship 
country Spain are Barcelona, Madrid, and Sevilla. In 
the traditional solution to pivoting, you can achieve 
the spreading logic using CASE expressions, like so: 


CASE WHEN shipcity - N'Barcelona' THEN freight 
END AS Barcelona, 

CASE WHEN shipcity - N'Madrid' 
AS Madrid, 

CASE WHEN shipcity - N'Sevilla' 
END AS Sevilla 


THEN freight END 


THEN freight 


Note that when an ELSE clause isn't specified in 
a CASE expression, an ELSE NULL is implied. 
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TABLE 1: Source 
Дака from Sales 


Because that's exactly 
the behavior we need, 


-Orders you can simply omit this 


| shipperid | shipcity | freight | part, as I did. Also note 


1 Barcelona 18.69 that in order to use а 
1 Barcelona 1.36 solution that's based оп 
l Madrid 1616 а static query, you have 
1 Madrid — 3299 0 know ahead of time 
| Madrid 294 what the distinct values 
l Madrid 1269 are in the spreading ele- 
l Sevilla 38.19 ment, or at least what the 
1 Sevilla 320 maximum number of 
l Sevilla 5911 distinct values 15. Later 
- in Ше article I cover а 
В ВЕШ = solution that relies on 
à їй ШЕ! string concatenation 
2 Madrid 8.29 techniques and dynamic 
2 Madrid 97.09 SQL to cope with situa- 
2 Sevilla 7530 | tions in which this infor- 
2 Sevilla 199 ^ mation isn't available. 
2 Sevilla 107.83 Тһе third апа last 
2 Sevilla 2518 ^ phase in pivoting is the 
3 Barcelona 1.25 aggregating phase. In 
3 Barcelona 6.54 this phase you aggregate 
3 Madrid 7.56 the data values іп each 
3 Sevilla 4.32 target pivoted column. 
3 Sevilla 8546 Іп our example, the 
3 Sevilla 2339 aggregate function is 


SUM and the aggrega- 
tion element is freight. In the traditional solution to 
pivoting, you simply apply the aggregate function to 
the results of the CASE expressions, like so: 


SUM(CASE WHEN shipcity = N'Barcelona' THEN 
freight END) AS Barcelona, 

SUM(CASE WHEN shipcity = N'Madrid' THEN 
freight END) AS Madrid, 

SUM(CASE WHEN shipcity - N'Sevilla' THEN 
freight END) AS Sevilla 


Here's the complete solution implementing all 
three phases: 


SELECT shipperid, 
SUM(CASE WHEN shipcity - N'Barcelona' THEN 
freight END) AS Barcelona, 
SUM(CASE WHEN shipcity = N'Madrid' | THEN 
freight END) AS Madrid, 
SUM(CASE WHEN shipcity = N'Sevilla' THEN 
freight END) AS Sevilla 


TABLE 2:Total Freights for Each 
Shipper and City 


Eshipperid [Barcelona Madrid байа | 
1 


20.05 64.78 100.50 
2 10.14 183.30 354.60 
3 7.19 7.56 113.17 


FROM Sales.Orders 
WHERE shipcountry = N'Spain' 
GROUP BY shipperid; 


You can consider this solution as a template. For 

any pivoting task you get, identify the three ele- 

ments involved and put them in the right places 

in this template. To recap, in our example those 

elements are: 

° Grouping: shipperid 

° Spreading: shipcity IN (Barcelona, Madrid, 
Sevilla) 

° Aggregation: SUM(freight) 


As for optimization, the general indexing guide- 
lines are to create an index with a key-list starting 
with the grouping element, followed by the spreading 
element, and including the aggregation element. If 
the query contains an additional filter—for example, 
to support a pivoting page field in the application— 
make it the leading column in the key-list. In our case 
the shipcountry column serves such a filtering pur- 
pose. Following these guidelines, here's the optimal 
index to support our pivoting solution: 


CREATE INDEX idx nc country sid city i freight 
ON Sales.Orders(shipcountry, shipperid, 
shipcity) 
INCLUDE(freight); 


Web Figure 2 shows the plan for the solution 
query. The plan performs a seek operation based 
on the shipcountry filter in our index, followed by a 
partial ordered scan of the qualifying rows. The plan 
then calculates the CASE expressions in the Compute 
Scalar operator. Finally, the plan handles the aggrega- 
tion logic using a stream aggregate operator that relies 
on the ordered scan of the index. This plan is very 
efficient, scanning only the qualifying rows, only once, 
and in an order that supports an efficient aggregate. 

As an aside, standard SQL has a certain feature 
that was probably designed to support pivoting needs, 
although SQL Server doesn't include this feature—it’s 
a filtering clause called FILTER that's available to 
aggregate functions, allowing the function to operate 
only on a subset of rows in the group instead of all 
rows. Pivoting data using the FILTER clause would 
look like this (don't try to run this code, because SOL 
Server doesn't support it): 


SELECT shipperid, 
SUM(freight) FILTER (WHERE shipcity 
- N'Barcelona') AS Barcelona, 
SUM(freight) FILTER (WHERE shipcity 
= N'Madrid' ) AS Madrid, 
SUM(freight) FILTER (WHERE shipcity 
- N'Sevilla' ) AS Sevilla 
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* Keep your competitive edge by staying on top of the virtualization for developers, jQuery jumpstart, 
latest technology and visit sessions in the co-located HTML 5 and С553, ASP.NET MVC, Powershell and 
events at no extra charge! SQL Server 2008, indexing strategies for SQL Server 

2008, SharePoint BCS, SharePoint collaboration, 


* Connect with colleagues and build a valuable 


network of peers and more. 
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Microsoft® 2 4. 


Visual Studio 


СОММЕСТТОМ 5, 


VVS01: EFFECTIVELY USING 
VISUAL STUDI0 2010 
KATHLEEN DOLLARD 


VNF02: INTRODUCTION T0 MEF 
KATHLEEN DOLLARD 


VWP03: THE ARCHITECT 
JUVAL LOWY 


VNF04: INTRODUCING THE AZURE 
APPFABRIC SERVICE BUS 
JUVAL LOWY 


VNF05: DISCOVER А NEW WCF 
WITH DISCOVERY 
JUVAL LOWY 


VLA06: CODE CORRECTNESS AND 
SOFTWARE TOOLS FOR .NET DEVELOPERS 
DINO ESPOSITO 


VFNO7: А FEATURE-DRIVEN COMPARISON 
OF ENTITY FRAMEWORK AND NHIBERNATE 
DINO ESPOSITO 


VVS08: ADVANCED TFS 2010 
VERSION CONTROL 
BRIAN RANDELL 


VVS09: ARE YOU RUNNING THE ULTIMATE 
VERSION OF VISUAL STUDIO 2010? 
BRIAN RANDELL 


VFN10: BUSY „МЕТ DEVELOPERS GUIDE TO ҒН 
TED NEWARD 


VFN11: ETW: LEARN IT, KNOW IT, LOVE IT! 
MICHÉLE LEROUX BUSTAMANTE 


VFN12: WHY YOU NEED APPFABRIC 
ACCESS CONTROL 
MICHÉLE LEROUX BUSTAMANTE 


VFN13: IDENTITY IN WINDOWS AZURE 
MICHÉLE LEROUX BUSTAMANTE 


VWN14: TACKLE THE COMPLEXITY ОҒ 
ASYNC CALLS IN SILVERLIGHT AND 
WPF CLIENTS 

BRIAN NOYES 


VFN15: PROGRAMMING THE BING API 
USING WCF 
MIGUEL CASTRO 


VFN16: WORKFLOW FOUNDATION 4: 
SHOULD | BOTHER OR CARE? 
BRIAN NOYES. 


VLV17: .NET ROCKS LIVE! 
CARL FRANKLIN AND RICHARD 
CAMPBELL 


VWP18: INTEGRATING WPF & WCF INTO 
YOUR OFFICE BUSINESS APPLICATIONS 
TIM HUCKABY 


VWP19: WINDOWS PHONE 7 AND 
SILVERLIGHT: THE BOLD PROMISE 
TIM HUCKABY 


VWP20: BUSINESS APPS IN HALF THE TIME: 


WPF AND SILVERLIGHT STYLING 
MARKUS EGGER 


VWP21: BUSINESS APPS IN HALF THE TIME: 


WPF AND SILVERLIGHT LAYOUT 
MARKUS EGGER 


ABSTRACTS ARE AVAILABLE ONLINE 


VFN22: VISUAL STUDIO AND SALESFORCE 
DAN APPELMAN 


VP023: THE HIDDEN ARCHITECT - BUILDING 
SOLUTIONS FOR SMALLER ORGANIZATIONS 
DAN APPELMAN 


ҮР024: THE JUST-IN-TIME PROGRAMMER 
DAN APPELMAN 


VFNO6: FROM ZERO TO LIGHTSWITCH 
IN 75 MINUTES 
PAUL D. SHERIFF 


VFNO3: A GRAPHICS DESIGN LESSON 
FOR DEVELOPERS 
MARKUS EGGER 


VFNO8: USING SERVICES AND SOA FOR 
MORE VERSATILE AND MORE 
MAINTAINABLE APPLICATIONS 
MARKUS EGGER 


VFNO1: TFS 2010 VERSION CONTROL 101 
BRIAN RANDELL 


VFNO4: TRANSITIONING FROM WINDOWS 
FORMS TO WPF 
MIGUEL CASTRO 


VFNO2: MVVM MADE SIMPLE 
PAUL D. SHERIFF 


VFNO7: WPF VALIDATION - 
TECHNIQUES & STYLES 
MIGUEL CASTRO 


VFNOS: COMMON PITFALLS OF OBJECT- 
ORIENTED SOFTWARE 
DINO ESPOSITO 


Sessions and speakers are subject to change. Check the Web site for details. www.DevConnections.com 


VISUAL STUDIO WORKSHOPS 


PRE-CONFERENCE WORKSHOPS 
SUNDAY, MARCH 27, 2011 


VPRO1: EVERY CLASS AS A SERVICE - WCF AS THE NEW .NET 


JUVAL LOWY 


VPRO2: CREATING AND ARCHITECTING BUSINESS 


APPLICATIONS WITH WPF 
PAUL D. SHERIFF 


9AM - 4PM 


POST-CONFERENCE WORKSHOP 


THURSDAY, MARCH 31, 2011 


9AM - 4PM 


ҮР501: BUILD A BETTER DEVELOPMENT SHOP: 


MICROSOFT VIRTUALIZATION FTW! 


BRIAN RANDELL 
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SILVERLIGHT & ASP.NET SESSIONS 


LDA01: BUILDING REUSABLE DATA BOUND 
SILVERLIGHT USER CONTROLS 
JONATHAN ATKINS 


10502: BUILDING DRILLABLE AND 
FILTERABLE SILVERLIGHT DASHBOARDS 
JONATHAN ATKINS 


LWRO3: BUILDING A FILE MANAGER AND 
EMAILING AND PRINTING WITH SILVERLIGHT 
JONATHAN ATKINS 


Ш104: GREAT LOOKING SILVERLIGHT APPS 
WITHOUT AN ARTIST 
KATHLEEN DOLLARD 


LWP05: APPLICATION LIFECYCLE IN 
WINDOWS PHONE 7 
DINO ESPOSITO 


ШІР06: USING LOCATION BASED SERVICES 
IN WINDOWS PHONE 7 
JOEL NEUBECK 


LUIO7: CONTROLLING SILVERLIGHT 
ANIMATIONS THROUGH VISUALSTATES 
JOEL NEUBECK 


LWPO8: USING THE WINDOWS PHONE 7 
CAMERA FOR MORE THAN TAKING PHOTOS 
JOEL NEUBECK 


LWRO9: ACCELERATE SILVERLIGHT 
BUSINESS APP DEVELOPMENT WITH 
WCF RIA SERVICES 

BRIAN NOYES 


LUI10: LEARN TO BEHAVE - EXTENDING THE 
POWER OF XAML WITH CUSTOM BEHAVIORS 
BRIAN NOYES 


LSP11: SILVERLIGHT AND SHAREPOINT 2010 
INTEGRATION 
AL PASCUAL 


ASP.NET WORKSHOPS 


PRE-CONFERENCE WORKSHOPS 
SUNDAY, MARCH 27, 2011 


APRO1: A RUNNING START WITH JQUERY ЭАМ = 12PM 


STEPHEN WALTHER 


10512: MAPS IN MICROSOFT SILVERLIGHT 4: 
INTRODUCTION TO GEOGRAPHIC 
INFORMATION SYSTEM (GIS) 

AL PASCUAL. 


LDA13: DATA BINDING FROM A-Z IN 
SILVERLIGHT 
PAUL SHERIFF 


LDS14: BUILDING ARCHITECTURALLY SOUND 
SILVERLIGHT APPLICATIONS WITH MVVM 
DAN WAHLIN 


LFNO1: GETTING STARTED BUILDING 
SILVERLIGHT APPLICATIONS 
DAN WAHLIN 


LWP16: FROM ZERO TO WINDOWS PHONE 7 
IN 75 MINUTES 
PAUL D. SHERIFF 


AFR201: WEB FORMS ARE FOR THE 
ENTERPRISE 
AL PASCUAL 


ADX301: LINQ IN LAYERED APPLICATIONS 
SCOTT ALLEN 


AMV301: GUIDING PRINCIPLES FOR ASP.NET 
MVC APPLICATIONS 
SCOTT ALLEN 


ACS301: HTML5 AND CSS3 TECHNIQUES YOU 
CAN USE TODAY 
TODD ANGLIN 


ACS302: DOING MORE WITH LESS FOR CSS 
TODD ANGLIN 


AGN201: THINGS EVERY ASP.NET DEVELOPER 
SHOULD KNOW 
ROBERT BOEDIGHEIMER 


CONNECTIONS 
: Заа 


Microsoft* 


w Silverlight: 


CONNECTIONS 


ACS201: JQUERY TIPS AND TRICKS 
ROBERT BOEDIGHEIMER 


AGN202: CRYPTOGRAPHY 101 USING THE 
.NET FRAMEWORK AND ASP.NET 
ROBERT BOEDIGHEIMER 


AAR301: IMPROVING TESTABILITY IN ASP.NET 
WEB FORMS WITH THE MVP PATTERN 
DINO ESPOSITO 


ACS303: BUILD ADVANCED WEB UI'S WITH A 
RICH JAVASCRIPT UI LANGUAGE 
PETER KELLNER 


ACS304: BUILD HUGELY PERFORMANT 
WEBSITES TAKING ADVANTAGE OF HTML5 
LOCAL STORAGE 

PETER KELLNER 


APF201: HOW TO MAKE YOUR APPLICATION 
MAKE THE BROWSER FLY, BE THE FASTEST 
SITE ON THE BLOCK 

PETER KELLNER 


AMV201: A WEBFORMS PROGRAMMER'S 
FIRST ASP.NET MVC 3 APPLICATION 
PAUL LITWIN 


ACS305: BUSY .NET DEVELOPER'S GUIDE TO 
ECMASCRIPT/JAVASCRIPT 
TED NEWARD 


ACS203: USING JQUERY TEMPLATES WITH 
ASP.NET 
STEPHEN WALTHER 


AFR301: DOING ASP.NET WEB FORMS THE 
RIGHT WAY 
STEPHEN WALTHER 


SILVERLIGHT WORKSHOP 


PRE-CONFERENCE WORKSHOP 


SUNDAY, MARCH 217, 2011 


9AM - 4PM 


LPR01: SHAWN WILDERMUTH'S 


BUILDING APPS WITH WINDOWS PHONE 7 
SHAWN WILDERMUTH 


APRO2: ACCELERATED ADOPTION: HTML5 AND CSS3 


POST-CONFERENCE WORKSHOP 


THURSDAY, MARCH 31, 2011 


9AM - 4PM 


ІР501: SILVERLIGHT 4 DEVELOPMENT WORKSHOP 


FOR ASP.NET DEVELOPERS 1PM - 4PM 

TODD ANGLIN 

POST-CONFERENCE WORKSHOP 

THURSDAY, MARCH 31, 2011 9AM - АРМ DAN WAHLIN 


АР501: A DAY OF ASP.NET MVC 
SCOTT ALLEN 


SQLServer 


CONNECTION S, 


SDV304: SPATIAL DATA "STRETCHES OUT" 
IN SQL SERVER DENALI 
BOB BEAUCHEMIN 


SDV306: FULL TEXT SEARCH IN SQL SERVER 
2008 AND DENALI 
BOB BEAUCHEMIN 


SDB414: EXPANDING THE SCOPE AND EASE 
OF USE OF QUERY PLAN GUIDES IN SQL 
SERVER 2008 

BOB BEAUCHEMIN 


SDV305: GETTING SQL SERVICE BROKER UP 
AND RUNNING 
DENNY CHERRY 


SDB206: SQL SERVER CLUSTERING 101 
DENNY CHERRY 


SDB215: EXPLORING THE DAC AND 
EVERYONE'S FAVORITE FEATURE, THE DACPAC 
DENNY CHERRY 


SDB202: REMOTE BLOB STORAGE: 
THE QUESTIONS AND THE ANSWERS 
VICTOR ISAKOV 


SDB311: TROUBLESHOOTING PARALLELISM 
PROBLEMS IN SQL SERVER 
VICTOR ISAKOV 


SDB203: WHAT EVERY DBA SHOULD KNOW 
ABOUT SHAREPOINT 2010 
VICTOR ISAKOV 


SBI102: BUILD YOUR FIRST SSIS PACKAGE 
ANDY LEONARD 


SDV201: DATABASE DESIGN FOR 
DEVELOPERS 
ANDY LEONARD 


SBI204: INTRODUCTION TO 
INCREMENTAL LOADS 
ANDY LEONARD 


SQL SERVER WORKSHOPS 


SOL * — 
ServerCentral.co 


SBI201: CREATING REPORT SUBSCRIPTIONS 
IN MICROSOFT SQL SERVER 2008 
REPORTING SERVICES 

PAUL LITWIN 


SBI303: PROGRAMMING SQL SERVER 2008 
REPORTING SERVICES 
PAUL LITWIN 


SDV207: TUNING T-SQL STEP BY STEP 
BRENT OZAR 


$08301: BLITZ! SQL SERVER TAKEOVERS 
BRENT OZAR 


SDB207: CONSOLIDATION, CLUSTERING, 
AND VIRTUALIZATION: CHOOSING WISELY 
BRENT OZAR 


SDB409: "DUDE, WHERE IS MY MEMORY? 
UNDERSTANDING MICROSOFT SQL SERVER 
MEMORY USAGE AND MANAGEMENT 
MACIEJ PILECKI 


SDV308: SQL SERVER USER-DEFINED 
FUNCTIONS - THE GOOD, THE BAD, THE UGLY 
MACIEJ PILECKI 


SDB304: TROUBLESHOOTING DEADLOCKS IN 
SQL SERVER 
MACIEJ PILECKI 


$08208: MORE DBA MYTHBUSTERS 
PAUL S. RANDAL 


SDB310: INDEX FRAGMENTATION: 
THE HIDDEN MENACE 
PAUL S. RANDAL 


$08412: UNDOCUMENTED TOOLS 
AND TRACE FLAGS 
PAUL S. RANDAL 


SDB213: FOLLOW THE RABBIT: WRAP-UP Q&A 
PAUL S. RANDAL AND 
KIMBERLY L. TRIPP 


SQL SERVER SESSIONS 


SDB305: VLDB: RECOVERING FROM 
ISOLATED DISASTERS 
KIMBERLY L. TRIPP 


SDV302: INDEX INTERNALS: 
WHAT YOU REALLY NEED TO KNOW! 
KIMBERLY L. TRIPP 


SDV303: OPTIMIZING PROCEDURAL CODE 
KIMBERLY L. TRIPP 


SSQ01: LOCKING AND BLOCKING 
MADE SIMPLE 
JOE WEBB 


55002: TIPS & TRICKS FOR WRITING 
BETTER QUERIES 
JOE WEBB 


55003: LEARN SQL SERVER INTERNALS 
USING EXTENDED EVENTS 
JONATHAN KEHAYIAS 


55004: WAIT STATISTICS: А 
TROUBLESHOOTING METHODOLOGY 
JONATHAN KEHAYIAS 


55005: WINDOWS OPERATING SYSTEM 
INTERNALS FOR DATABASE PROS 
K. BRIAN KELLEY 


SSQ06: FROM THE RAMPARTS: KNOWING 
WHAT TO LOOK FOR IN SQL SERVER SECURITY 
K. BRIAN KELLEY 


SSQ07: DATABASE MAINTENANCE 
ESSENTIALS 
BRAD M. MCGEHEE 


55008: HOW TO OPTIMIZE TEMPDB 
PERFORMANCE 
BRAD M. MCGEHEE 


55009: HOW TO PERFORM А SQL SERVER 
HEALTH CHECK 
BRAD M. MCGEHEE 


PRE-CONFERENCE WORKSHOPS 
SUNDAY, MARCH 27, 2011 9AM - 4PM 


SPR01: THE BUILDING BLOCKS OF А HEALTHY 
AND AVAILABLE SQL SERVER 
KIMBERLY L. TRIPP AND PAUL S. RANDAL 


POST-CONFERENCE WORKSHOP 


THURSDAY, MARCH 31, 2011 9AM - 4PM 


SPSO1: INDEXING STRATEGIES AND ANALYSIS 
KIMBERLY L. TRIPP & PAUL S. RANDAL 


5Р502: VIRTUALIZATION AND SAN BASICS FOR DBAS 
SPRO2: DAY OF SCRIPTING: PLUMBING THE DEPTHS OF SQL BRENT OZAR 
SERVER / POWERSHELL INTEGRATION 


BOB BEAUCHEMIN 
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SHAREPOINT SESSIONS 


DEVELOPMENT TRACK 


HDEV01: DEVELOPERS DEEP DIVE INTO 
SHAREPOINT SECURITY 
TED PATTISON 


HDEV02: SHAREPOINT DATA ACCESS 
SHOOTOUT 
TED PATTISON 


HDEVO3: ADVANCED CONTROL AND WEB 
PART DEVELOPMENT 
TED PATTISON 


HDEVO4: RECORDS MANAGEMENT 
IMPROVEMENTS IN SHAREPOINT 2010 
JOHN HOLLIDAY 


HDEV05: SHAREPOINT 2010 RECORDS 
MANAGEMENT DEVELOPMENT 
JOHN HOLLIDAY 


HDEV06: CONTENT TYPE DISCOVERY USING 
DEPENDENCY STRUCTURE MATRIX ANALYSIS 
JOHN HOLLIDAY 


HDEVO7: BUILDING CUSTOM APPLICATIONS 
(MASHUPS) ON THE SHAREPOINT PLATFORM 
TODD BAGINSKI 


HDEV08: BUSINESS CONNECTIVITY 
SERVICES (BCS) DEVELOPMENT PATTERNS 
TODD BAGINSKI 


HDEVO9: INTEGRATING WINDOWS 7 MOBILE 
APPLICATIONS WITH SHAREPOINT SITES 
TODD BAGINSKI 


HDEV10: UPGRADING WEB PARTS FOR USE 
ON SHAREPOINT 2010 
MAURICE PRATHER 


HDEV11: BUILDING CLAIMS-AWARE 
APPLICATIONS AND CONTROLS 
MAURICE PRATHER 


HDEV12: SHAREPOINT GUIDANCE - 
DEVELOPING APPLICATIONS - FOUNDATION 
AND EXECUTION 

ROBERT L. BOGUE 


HDEV13: ENHANCING THE SHAREPOINT 
SOCIAL EXPERIENCE WITH THE 
SHAREPOINT 2010 SOCIAL API 

MATT MCDERMOTT 


HDEV14: EXPLOITING THE “HIDDEN GEMS” 
OF THE SHAREPOINT SOCIAL API 
MATT MCDERMOTT 


HDEV15: ECM FROM A DEVELOPER'S 
PERSPECTIVE 
PAUL SWIDER 


HDEV16: BUILDING APPLICATIONS WITH THE 
CLIENT OBJECT MODELS 
SCOT HILLIER 


HDEV17: ADVANCED SEARCH-BASED 
SOLUTIONS IN SHAREPOINT 2010 
SCOT HILLIER 


HDEV18: DEVELOPING RICH CLIENT 
SOLUTIONS WITH BUSINESS 
CONNECTIVITY SERVICES 

SCOT HILLIER 


ADMIN TRACK 


HITP01: WISH I'D HAVE KNOWN THAT 
SOONER! SHAREPOINT INSANITY 
DEMYSTIFIED 

DAN HOLME 


НІТР02: SHAREPOINT 2010 DEPLOYMENT 
DEMOFEST 
BEN CURRY 


НІТР03: ARCHITECTING A SHAREPOINT 
SERVER 2010 FARM 
BEN CURRY 


НІТР04: ARCHITECTURE BEHIND THE 
SOCIAL COMPUTING PLATFORM IN 
SHAREPOINT 2010 

BEN CURRY 


НІТР05: DESIGNING GOVERNANCE: HOW 
INFORMATION MANAGEMENT AND SECURITY 
MUST DRIVE YOUR DESIGN 

DAN HOLME 


НІТР06: A PRACTICAL JUMP START TO 
ADMINISTERING SHAREPOINT WITH 
WINDOWS POWERSHELL 

DAN HOLME 


НІТР07: INFORMATION ARCHITECTURE AND 
THE MANAGED METADATA SERVICE: A TO Z 
DAN HOLME 


НІТР08: WINDOWS POWERSHELL FOR 
SHAREPOINT ADMINISTRATORS AND 
DEVELOPERS 

DON JONES 


НІТР09: SHAREPOINT SERVICE 
ARCHITECTURE DRILL-DOWN 
JOEL OLESON 


HITP10: UPGRADING TO SHAREPOINT 2010 
JOEL OLESON 


HITP11: SHAREPOINT SEARCH CHALLENGES 
AND TRICKS 
MATTHEW MCDERMOTT 


HITP12: BUILDING THE PERFECT 
SHAREPOINT 2010 FARM: REAL-WORLD BEST 
PRACTICES FROM THE FIELD 

MICHAEL NOEL 


HITP13: ARCHITECTING A FAULT 
TOLERANT AND HIGH PERFORMANCE 
SHAREPOINT 2010 FARM 

MICHAEL NOEL 


SHAREPOINT WORKSHOPS 


PRE-CONFERENCE WORKSHOPS 
SUNDAY, MARCH 27, 2011 9AM - 4PM 


POST-CONFERENCE WORKSHOP 


THURSDAY, MARCH 31, 2011 9AM - АРМ 


НРК01: SHAREPOINT 2010 PROFESSIONAL DEVELOPMENT 
ROBERT L. BOGUE & ERIC SCHUPPS 


НР501: BUSINESS CONNECTIVITY DEEP DIVE 
SCOT HILLIER & TODD BAGINSKI 


НР502: ORGANIZING INFORMATION IN SHAREPOINT SERVER 2010 
BILL ENGLISH 


НРК02: DAN HOLME'S SHAREPOINT 
COLLABORATION MASTERCLASS 
DAN HOLME 
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SHAREPOINT SESSIONS 


HITP14: PLANNING EXTRANET 
ENVIRONMENTS WITH 
SHAREPOINT 2010 
MICHAEL NOEL 


HITP15: CLAIMING T0 GET FORMS-BASED 
AUTHENTICATION 
ROBERT L. BOGUE 


НІТРІ6: PROTECT YOUR SHAREPOINT FARM 
FROM THE EVIL DEVELOPERS 
ROBERT L. BOGUE 


NO CODE SOLUTIONS TRACK 


HNCSO1: MANAGE YOUR EXTERNAL DATA 
USING BUSINESS CONNECTIVITY SERVICES 
++ WITHOUT CODE 

ASIF REHMANI 


HNCSO2: USE DATA VIEWS TO GET TO YOUR 
DATA — BOTH INSIDE AND OUTSIDE OF 


HNCS04: USING INFOPATH 2010 AND 
SHAREPOINT DESIGNER 2010 TO MANAGE 
SHAREPOINT LIST FORMS 

ASIF REHMANI 


HNCS05: PERFORMANCEPOINT SERVICES 
2010: BUILDING А DASHBOARD IN 60 
MINUTES OR LESS 

DARRIN BISHOP 


HNCS06: UNDERSTANDING POWERPIVOT 
AND WHAT IT BRINGS TO THE TABLE 
MAURICE PRATHER. 


HNCSO7: SOLUTIONS WITHOUT SEMICOLONS 
- THE IT PROS GUIDE TO SOLUTION 
CREATION 

ROBERT L. BOGUE 


HNCS08: USING OUTLOOK AND THE 
SHAREPOINT WORKSPACE WITH 
SHAREPOINT 2010 


SHAREPOINT AND BUSINESS TRACK 


HSBO1: SHAREPOINT BRANDING: CREATING 
А SUCCESSFUL BRANDING PROJECT MAP 
CATHY DEW 


Н5В02: CREATING CONSISTENCY IN USER 
INTERFACE DESIGN WITH SHAREPOINT 2010 
CATHY DEW 


Н5В03: DON'T JUST MIGRATE - TRANSFORM 
YOUR SHAREPOINT ENVIRONMENT 
CHRISTIAN BUCKLEY 


Н5В04: SHAREPOINT'S SOCIAL COMPUTING 
SCORECARD 
CHRISTIAN BUCKLEY 


Н5В05: TRUST ME I AM А DEVELOPER: 
THINGS AN ADMIN SHOULD KNOW ABOUT 
DEVELOPING ON SHAREPOINT 

DARRIN BISHOP 


Н5В06: SHAREPOINT AS А PLATFORM FOR 
BUSINESS APPLICATIONS 
OWEN ALLEN 


Н5В07: HORIZONTAL AND VERTICAL 
BUSINESS SOLUTIONS FOR 


SHAREPOINT SCOT HILLIER 

ЕЛЕ HNCS09: SHAREPOINT SOLUTIONS FOR 
HNCSO3: AUTOMATING BUSINESS INFORMATION TECHNOLOGY 
PROCESSES USING INFOPATH 2010 FORMS PROFESSIONALS 

WITH INTEGRATED SHAREPOINT DESIGNER PAUL SWIDER 

2010 WORKFLOWS 

ASIF REHMANI 


SHAREPOINT 2010 
OWEN ALLEN 


ABSTRACTS ARE AVAILABLE ONLINE 
Sessions and speakers are subject to change. Check the Web site for details. 


Special Evening Events 


DevConnections Open Spaces 


In addition to our regular scheduled sessions, Connections is 
sponsoring an unstructured evening event called 
DevConnections Open Spaces. This event is meant to be an 
organic, consensus-driven gathering where anyone can lead a 
session. As for topics, pretty much anything goes. Past topics 
have included “ASP.NET MVC development: present and 
future", "The business side of software consulting", "Parallel 
programming and how it has changed your design approach", 
and "A discussion of topics of concern to IT folks". The Open 
Spaces event will start with a gathering where participants 
will assemble to propose topics, vote on topics, and nosh on 
some snacks and drinks. Then we will quickly assign topics to 
rooms and get on with it. You can choose to lead (or co-lead) 
a session or participate from the audience. Or you can partic- 
ipate as an active listener. It's your choice. 


RC Madness 


Juval Lowy and Richard Campbell go a little mad as they fly 
simulated RC aircraft using real RC controls and big projec- 
tor screens! Come hang out with the guys, you might win 
your own set of RC controls! They'll be flying a variety of air- 
craft, including helicopters, and there might even be a little 
air-to-air combat action! Its geeky, it's tricky, its RC Madness! 


Architecture Katas TED NEWARD 


Fred Brooks said, "How do we get great designers? Great 
designers design, of course." So how do we get great archi- 
tects? Great architects architect. But architecting a software 
system is a rare opportunity for the non-architect. 


The kata is an ancient tradition, born of the martial arts, 
designed to give the student the opportunity to practice 
more than basics in a semi-realistic way. The coding kata, 
created by Dave Thomas, is an opportunity for the develop- 
er to try a language or tool to solve a problem slightly more 
complex than "Hello world". The architectural kata, like the 
coding kata, is an opportunity for the student-architect to 
practice architecting a software system. 


In this evening session, limited to 50 people, attendees will 
be split into small groups and given a "real world" business 
problem (the kata). Attendees will be expected to formulate 
an architectural vision for the project, asking questions (of 
the instructor) as necessary to better understand the 
requirements, then defend questions (posed by both the 
instructor and their fellow attendees) about their choice in 
technology and approach, and then evaluate others' efforts 
in a similar fashion. No equipment is necessary to partici- 
pate-the great architect has no need of tools, just their mind 
and the customers' participation and feedback. 
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CONFERENCE REGISTRATION е МАЕСН 27-30, 2011 ONLINE: www.DevConnections.com 


FULL CONFERENCE REGISTRATION INCLUDES KEYNOTE ON MARCH 27, 2011 E-MAIL: info&DevConnections.com 
THROUGH CLOSING SESSION MARCH 30TH, 4:30PM PHONE: (800) 438-6720 
(203) 400-6121 
NAME PRIORITY CODE FAX: (913) 514-9362 
MALL: 
COMPANY TITLE Penton Media 
731 Main Street Ste C3 
STREET ADDRESS (REQUIRED TO SHIP MATERIALS) Monroe CT 06468 
CITY, STATE, POSTAL CODE COUNTRY 
TELEPHONE FAX E-MAIL ADDRESS (IMPORTANT) 
© Microsoft Visual Studio Connections 
ГІ Microsoft ASP.NET & Silverlight Connections 
[] SQL Server Connections 
ГІ SharePoint Connections 
Оп or Before February Ist, 201 $1495 
After February 1st, 2011... $1595 


FOR WHICH CONFERENCE ARE YOU REGISTERING? 
PRE-CONFERENCE WORKSHOPS SUNDAY, MARCH 27, 2011 LUNCH IS INCLUDED WITH FULL DAY WORKSHOPS. 


[ 1 VPRO1: EVERY CLASS AS A SERVICE - WCF AS THE NEW „МЕТ 1о\ү..............................................9АМ-4РМ__...................0399 
С] VPR02: CREATING AND ARCHITECTING BUSINESS APPLICATIONS WITH WPF $НЕВЕЕ..................9АМ-4РМ_................... $399 
С] APRO1: A RUNNING START WITH JQUERY | WALTHER... GAM-TPM eee $199 
[ | | APRO2: ACCELERATED ADOPTION: HTML5 AND CSS3 FOR ASP.NET DEVELOPERS ANGLIN...........1 IM APM indre $199 
С] ІРЕСІ: SHAWN WILDERMUTH'S BUILDING APPS WITH WINDOWS PHONE 7 WILDERMUTRH........... QAM - АРМ... $425 
HANDS ON, BRING YOUR OWN LAPTOP 
С] SPRO2: DAY OF SCRIPTING: PLUMBING THE DEPTHS OF SQL SERVER BEAUCHEMIN...................... ОТЫ sne $399 
C]  SPRO1: THE BUILDING BLOCKS OF A HEALTHY AND AVAILABLE SQL SERVER TRIPP & RANDAL ........ЗАН -4РМ сие 0399 
C] НРЕОІ: SHAREPOINT 2010 PROFESSIONAL DEVELOPMENT BOGUE & 5снирг$...................................93АМ-4РМ_..................0399 
[ | HPR02: DAN HOLME'S SHAREPOINT COLLABORATION MASTERCLASS  ном................................9АМ-4РМ_...................8399 


POST-CONFERENCE WORKSHOPS THURSDAY, MARCH 31, 2011 LUNCH IS INCLUDED WITH FULL DAY WORKSHOPS. 
APSO1: BUILD A BETTER DEVELOPMENT SHOP: MICROSOFT VIRTUALIZATION ЕТМ! RANDELL.....9MM- Ри _..................$399 
VPSO1: A DAY OF ASP.NET MVC ALLEN i.d a 5399 


5Р501: INDEXING STRATEGIES AND ANALYSIS TRIPP & RANDAL 
5Р502: VIRTUALIZATION AND SAN BASICS FOR DBAS ozar . 
HPSO1: BUSINESS CONNECTIVITY DEEP DIVE HILLIER & BAGINSKI... 
НР502: ORGANIZING INFORMATION IN SHAREPOINT SERVER 2010 ENGLISH . 


L.] CHECKS (payable to Penton Media)) All payments must be in US Currency. Checks must be drawn on a US bank. 


CREDIT CARD LJ VISA MASTERCARD AMEX 
CREDIT CARD NO. EXPIRATION DATE 


Cardholder's Signature Cardholder's Name (print) 


FROM Sales.Orders 
WHERE shipcountry = N'Spain' 
GROUP BY shipperid; 


But as I showed earlier, it's very easy to implement 
similar logic with CASE expressions. 


Pivoting with the Native PIVOT 
Operator 

In the previous section, I covered the traditional solu- 
tion to pivoting, using only standard constructs. One 
of the downsides of this technique is that it's verbose. 
When you have a large number of distinct values in 
the spreading element, the code can get very long. 
SQL Server 2005 introduced a native nonstandard 
table operator called PIVOT that enables less verbose 
solutions. Very much like the JOIN table operator, 
you specify PIVOT in the FROM clause of a query, 
and the output of this table operator is a table result. 
The syntax of the PIVOT operator is 


FROM «source table» 
PIVOT( «aggregate function»(«aggregation | 
element») 
FOR «spreading element» 
IN («distinct spreading values») ) AS 
«alias» 


In this syntax you can identify two out of the three 
elements that are supposed to be involved in piv- 
oting: the aggregation function and element, and the 
spreading element and the distinct spreading values. 
The tricky part 1s that the grouping element 15 implic- 
itly defined by elimination—the grouping element 
is all columns from «source table» besides those 
mentioned as either the spreading or aggregation ele- 
ment. Therefore, you should refrain from querying 
underlying tables directly, and always prepare a table 
expression in the form of a CTE or derived table that 
contains only the three elements that are supposed to 
be involved in pivoting. This way, the right grouping 
element that you need will be defined by elimination. 
With this in mind, here's a query using the native 
PIVOT operator that returns total freight values for 
each shipper and city for orders shipped in Spain: 


WITH PivotInput AS 

( 

SELECT shipperid, shipcity, freight 
FROM Sales.Orders 

WHERE shipcountry = N'Spain' 

) 

SELECT * 

FROM PivotInput 

PIVOT( SUM(freight) 
FOR shipcity IN (Barcelona, Madrid, 
Sevilla) ) AS PivotOutput; 
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Note that because the city names happen to be 
strings that are considered valid regular identifiers in 
our case, you don't need to delimit them. But if the 
values were such that they weren't considered regular 
identifiers (e.g., if they started with a digit), you would 
need to delimit them using either square brackets or 
double quotes. For example, if the spreading values 
represented order years, you'd use the form 


FOR orderyear IN ([2096], [2007] , [2008) 


As I mentioned, even experienced T-SQL developers 
have a hard time getting used to this syntax—but all 
you need to do is identify the three elements involved 
in pivoting (grouping, spreading, aggregating) and put 
those elements in the right places in the template. 

Web Figure 3 shows the plan for the PIVOT query. 
As you can see, this plan is very similar to that of the 
standard solution—so much so that if you look at the 
properties of the Aggregate operator, under Defined 
Values, you'll find that SQL Server constructed CASE 
expressions behind the scenes: 


[Expr1022] = Scalar Operator(SUM(CASE 
WHEN [InsideTSQL2008] . [Sales]. [Orders] 
.[shipcity]-N'Barcelona' THEN 
[InsideTSQL2098]. [Sales]. [Orders] . [freight] 
ELSE NULL END)), 


With this in mind, you shouldn't expect the solu- 
tion that's based on the PIVOT operator to perform 
better than the standard solution. The main benefit 
in the PIVOT operator at the moment is that it's less 
verbose. In fact, there are several ways in which the 
PIVOT operator is more limited than the standard 
method. I describe these limitations, as well as work- 
arounds, in the following sections. 


Pivoting Challenges 

Certain common pivoting needs create challenges 
because of syntax limitations or other reasons. 
One such challenge occurs when the spreading ele- 
ment doesn't originally exist in the source data. For 
example, given the sample data used earlier in the 
article, suppose that you need to return for each cus- 
tomer a column for each of its order IDs (e.g., for the 
five most recent orders). The grouping element (“оп 
rows" element) is custid. The aggregation element 
(data element) is orderid. But there's no spreading 
element that's common across customers by which 
you can spread the order IDs. 

The solution is to assign a row number to the orders, 
partitioned by custid, and ordered by whatever order 
that you want to return those values. For example, if 
you want to return the order IDs from most recent 
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to least recent, you сап order by orderdate DESC, 
orderid DESC. As Imentioned earlier, in a static pivot 
solution, you need to decide on the maximum number 
of spreading values that you want to support (e.g., in 
our case, the five most recent orders). 

The transition of order IDs from source to target 
rows 18 a one-to-one transition, so the use of an aggre- 
gate isn't really intuitive. Still, the only way for you to 
return the data element is using an aggregate function, 
because pivoting relies on grouped data. So you simply 
need to pick an artificial aggregate that will return the 
only applicable orderid value (if at all) for each customer 
and row number. Both MIN and MAX are aggregate 
functions that will work well for this need, because they 
support many types; and in case there's only one non- 
NULL value, they simply return that value. 

Here's the complete solution returning the order 
IDs of the five most recent orders for each customer, 
for orders shipped in Spain: 


WITH PivotInput AS 
( 
SELECT custid, orderid, 
ROW NUMBER() OVER(PARTITION BY custid 
ORDER BY orderdate DESC, orderid DESC) 
AS rownum 
FROM Sales.Orders 
WHERE shipcountry = N'Spain' 
) 
SELECT custid, [1],[2],[3],[4],[5], 
CASE WHEN [6] IS NOT NULL THEN 'Y' ELSE 'N' END 
AS moreorders 
FROM PivotInput 
PIVOT(MAX(orderid) FOR rownum 
IN([1],[2],[3],[4],[5],[6])) AS PivotOutput; 


This solution also indicates whether the customer has 
additional orders beyond the five most recent ones. 
Table 3 shows this query's output. 


Limitations and Solutions 

The PIVOT operator has а few limitations that aren't 
applicable to the standard pivoting solution. One 
limitation is that the PIVOT operator doesn't support 
expressions in the aggregation and spreading ele- 
ments directly, whereas the standard pivoting solution 
does. For example, suppose you want to aggregate a 
discounted freight value defined by the expression 


TABLE 3: Most Recent Order IDs for Each Customer 
for Orders Shipped in Spain 


ый CF 15 moreorders 


8 10970 
29 10928 
30 11037 
69 11013 


18 january 2011 


10801 10326 NULL NULL N 
10887 10568 10426 10366 М 
11009 10948 10911 10888 Y 
10917 10306 10282 10281 N 


freight * 0.9. With the standard pivoting solution you 
can simply specify the expression where you normally 
specify the aggregation element after the THEN 
clause of the CASE expressions, like so: 


SUM(CASE WHEN shipcity - N'Barcelona' THEN 
freight * 0.9 END) AS Barcelona 


But if you try to specify an expression in the aggrega- 
tion element of the PIVOT operator—for example, 
SUM(freight * 0.9)—you get an error. The work- 
around is to assign column aliases to calculations in 
a table expression and use those aliases in the PIVOT 
operator. For example, the following query shows 
how to aggregate a discounted freight: 


WITH PivotInput AS 
( 
SELECT shipperid, shipcity, freight * 0.9 AS 
discountedfreight 
FROM Sales.Orders 
WHERE shipcountry = N'Spain' 
) 
SELECT * 
FROM PivotInput 
PIVOT( SUM(discountedfreight) FOR shipcity IN 
(Barcelona, Madrid, Sevilla) 
) AS PivotOutput; 


Another limitation is that the PIVOT operator 
doesn't support the COUNT(*) aggregate. One work- 
around is to apply the COUNT aggregate to a non- 
NULLable column—and if there's a column, it makes 
sense to use the same one that you use as the spreading 
element (provided that it's а non-NULLable column). 
This way, you can more easily cover the query with an 
index. Here's an example returning the count of orders 
for each shipper and city, for orders shipped in Spain: 


WITH PivotInput AS 

( 

SELECT shipperid, shipcity 

FROM Sales.Orders 

WHERE shipcountry = N'Spain' 

) 

SELECT * 

FROM PivotInput 

PIVOT( COUNT(shipcity) FOR shipcity IN 

(Barcelona, Madrid, Sevilla) 
) AS PivotOutput; 


Another workaround is to apply the aggregate to a 
column generated by a constant expression, like so: 


WITH PivotInput AS 


( 
SELECT shipperid, shipcity, 1 as const 
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FROM Sales.Orders 

WHERE shipcountry = N'Spain' 
) 
SELECT * 

FROM PivotInput 

PIVOT( COUNT(const) FOR shipcity IN (Barcelona, 

Madrid, Sevilla) 
) AS PivotOutput; 


A third limitation is that the PIVOT operator 
doesn't support multiple spreading elements, whereas 
the standard technique does. For example, suppose 
you want to spread the data by both shipped year and 
ship city. With the standard method you simply specify 
two predicates in the CASE expression, like so: 


SUM(CASE WHEN YEAR(shippeddate) = 2006 AND 
shipcity = N'Barcelona' THEN freight END) AS 
Barcelona 


The workaround using the PIVOT operator is to 
concatenate the multiple spreading elements into one 
element in the table expression that you're preparing 
as the input to the PIVOT operator. Then, specify 
the alias of the concatenated values as the spreading 
element, and the distinct concatenated values in the 
IN clause. Here's an example: 


WITH PivotInput AS 


( 
SELECT 
shipperid, 
CAST(YEAR(shippeddate) AS NCHAR(4)) + N'_' + 
shipcity AS year city, 
freight 
FROM Sales.Orders 
WHERE shipcountry = N'Spain' 
) 
SELECT * 
FROM PivotInput 
PIVOT( SUM(freight) FOR year city IN ( 
[2006 Barcelona], [2006 Madrid], [2006 
Sevilla], 
[2007 Barcelona], [2007 Madrid], [2007 
Sevilla], 
[2008 Barcelona], [2008 Madrid], [2008 _ 
Sevilla] ) ) AS PivotOutput; 


Table 4 shows this query's output. 


PIVOTING DATA 


Yet another limitation is related to the spreading ele- 
ment when using the PIVOT operator. Remember that 
the spreading values that you specify in the IN clause are 
column values in the source and become attribute names 
in the target. Hence they must follow the requirements 
for identifiers, which means they can't exceed 128 char- 
acters. There's no way to alias the target column names 
directly in the IN clause; instead, you can alias them in 
the querys SELECT list—but that's too late. With the 
standard syntax for pivoting, you specify the spreading 
values in the filters of the CASE expressions, and you 
assign any alias that you like directly in the expression, 
so the source values aren't limited to 128 characters. 

The last limitation I want to cover is that the 
PIVOT operator doesn't support multiple aggregates, 
whereas the standard technique does. For example, 
suppose you need to calculate both the sum and 
average freight for each shipper by city. Using the 
standard method, you simply add more expressions 
to the SELECT list, like so: 


SUM(CASE WHEN shipcity - N'Barcelona' THEN 
freight END) AS SUM Barcelona, 


AVG(CASE WHEN shipcity = N'Barcelona' THEN 
freight END) AS AVG Barcelona, 


The workaround using the PIVOT operator is to use 
a separate operator for each aggregate and join the 
results, like so: 


WITH PivotInput AS 
( 
SELECT shipperid, shipcity, freight 
FROM Sales.Orders 
WHERE shipcountry = N'Spain' 
) 
SELECT P1.shipperid, 
P1.Barcelona as SUM Barcelona, P1.Madrid as 
SUM Madrid, P1.Sevilla as SUM Sevilla, 
P2.Barcelona as AVG Barcelona, P2.Madrid as 
AVG Madrid, P2.Sevilla as AVG Sevilla 

FROM PivotInput PIVOT( SUM(freight) FOR 
shipcity IN (Barcelona, Madrid, Sevilla) ) 
AS P1 

JOIN PivotInput PIVOT( AVG(freight) FOR 
shipcity IN (Barcelona, Madrid, Sevilla) ) 
AS P2 

ON Pl.shipperid = P2.shipperid; 


TABLE 4:Total Freights for Each Shipper and Year City 


shipperid | 2006 _ 2006 _ 2006 - 2007 Ваг- | 2007 _ 2007 _ 2008 Ваг- | 2008 _ 2008 _ 
Barcelona | Madrid — |Sevilla |сеопа — |Madrid [Sevilla |сеюпа [Madrid | Sevilla 


NULL 15.63 NULL 18.69 
10.14 11.92 107.83 NULL 
3 NULL 7.56 NULL 6.54 
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NULL NULL 1.36 49.15 100.50 
97.09 NULL NULL 8.29 246.77 
NULL 89.78 S NULL 23.39 


LY 
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LISTING 1: Dynamic Pivot Example 


20 January 2011 


TABLE 5: Shipping Information 
Е Cities in the UK 


4195 15705 53688 
р 13608 73906 — 
3 746 — 1052 — MD 


However, this method doesn't perform as well as the 
standard method because the data is scanned sepa- 
rately for each aggregate. 


Dynamic Pivoting 

A final challenge or limitation of the static solu- 
tions to pivoting is such a commonly asked request 
that it deserves its own section. With static solutions 
to pivoting, you have to know ahead of time what 
the unique spreading values are—or at least the 
maximum number. For example, when the spreading 
element is the ship city, you need to know which cities 
currently exist in the data. At the moment, for orders 
shipped in Spain, the Sales.Orders table contains 
orders shipped in the cities Barcelona, Madrid, and 
Sevilla. But the set of distinct cities could change— 
for example, because of new orders. To come up 
with a pivoting solution that dynamically adjusts the 
spreading elements based on the existing data, you 
need to use dynamic SQL. 

First, you need a technique that performs string 
concatenation so that you concatenate the distinct 
ship cities you query at run time to one list that you'll 
use later in the PIVOT operators IN clause. An 


efficient method to perform string concatenation 18 
to use the FOR XML PATH option with an empty 
string as input, indicating that you're not interested 
in a root node, and return the values as text nodes 
with no tags. For example, here's the code you'd use 
to concatenate the distinct Spain ship cities: 


SELECT STUFF( 
(SELECT N',' + QUOTENAME(shipcity) 
FROM (SELECT DISTINCT shipcity 
FROM Sales.Orders 
WHERE shipcountry = N'Spain') AS Cities 
ORDER BY shipcity 
FOR XML PATH(''), TYPE).value('.[1]', 
"NVARCHAR(MAX) '), 1, 1, М) 


The innermost query returns the distinct ship cities 
in Spain. The outer query uses the QUOTENAME 
function to turn the city names into identifiers, adding 
square brackets and a comma in front of each city. 
The FOR XML PATH(") option concatenates the 
elements into one string, and finally, the STUFF 
function is used to get rid of the unnecessary first 
comma. The result of this code is the string 


[Barcelona], [Madrid], [Sevilla] 


Listing 1 contains the definition of а stored 
procedure called PivotShipping that accepts a ship- 
ping country name as input, constructs the complete 
PIVOT query dynamically, including the technique 
shown to concatenate the distinct ship city names, 
then executes it using sp executesql. 

Here's an example of invoking the procedure 
with Spain as the input city, producing the output 
in Table 2: 


EXEC dbo.PivotShipping Gcountry = N'Spain'; 


And here's an example of invoking the procedure with 
UK as the shipping country, returning the output in 
Table 5: 


EXEC dbo.PivotShipping @country = N'UK'; 


Easier than You Thought 
Although SQL pivoting techniques have been around 
for a while, developers still ask a lot of questions in 
classes and public forums about how to achieve piv- 
oting. Pivoting really isn't very difficult —once you get 
the idea, it just entails identifying pivoting elements 
in a given request and putting those elements in the 
correct places in the code. If you've found the PIVOT 
operator intimidating in the past, I hope this article 
helps make you feel more comfortable with it. Next 
month ГЇЇ cover techniques to unpivot data. 500] 
InstantDoc ID 128846 
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Configuring SQL Server 
Service Broker 


Use I-SQL to take advantage of asynchronous 


data processing 


hen the SQL Server Service Broker was 
Wis in SQL Server 2005, it added 

a new weapon to the SQL Server arsenal: 
a technique known as asynchronous data processing. 
Although it isn’t new to the programming world, it’s 
new to the database world. During asynchronous 
data processing, a command is executed by a client, 
but the client doesn’t wait for the command to be 
completed. 

The Service Broker offers guaranteed message 
delivery in the order in which the messages are sent, 
provided that the messages are sent within a single 
conversation. However, messages that are sent in dif- 
ferent conversations can be received out of order if an 
earlier message takes longer to arrive. This typically 
happens when messages are sent between servers over 
a slow network link. Regardless of the order in which 
they’re sent, messages are processed in the order in 
which they’re received. They’re always processed one 
time and only one time, within the confines of the 
conversation in which they’re sent. 


Using Object Types to 
Configure the Service Broker 
The Service Broker doesn’t have a UI for configura- 
tion, so you configure it through T-SQL. When you 
configure the Service Broker, you must create the 
following six types of objects: 

* Message 

Contract 

* Queue 

* Service 

* Route 

* Endpoint 


Each of these object types fulfills a specific function 
within the Service Broker, and the objects must be 
created in the correct order. Most objects should be 
created in pairs in which one object is the source of 
the communication and one is the destination of the 
communication. 

The data that's sent within a message can be of any 
type of data. The data is kept in a binary state while 
it's in transit and while it's stored in the queue. When 
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data is sent via the Service Broker, it's typically sent 
within an XML document for the most flexibility. 

Before you can use the Service Broker, you must 
enable it by using the ALTER DATABASE command. 
By default, when you create or restore a database, the 
Service Broker is disabled. The ALTER DATABASE 
command has two switches that you use together 
with the Service Broker: NEW BROKER and 
ENABLE BROKER. The NEW BROKER switch 
is used to create a new Service Broker in the database. 
If you already have the Service Broker enabled, the 
NEW BROKER switch removes any messages that 
are in flight and closes any conversations that are 
open. If you use the ENABLE BROKER switch, 
any messages that are in flight when the database 1s 
backed up will be allowed to continue, and any con- 
versations that are already open will be allowed to 
continue. You can usually get away with using only 
the ENABLE BROKER switch. 


ALTER DATABASE sample 
SET NEW BROKER; 


ALTER DATABASE sample 
SET ENABLE BROKER; 


You can check whether the Service Broker is enabled 
by checking the is broker enabled column of the sys 
.databases catalog view. 

Before you can send and receive messages by 
using the Service Broker, you have to set a database 
master key, if one doesn't already exist in the data- 
base. To do this, use the CREATE MASTER KEY 
statement, and specify the password for the database 
master key. 


CREATE MASTER KEY ENCRYPTION BY 
PASSWORD- 'МуРа$$ийга' 


Object Type 1: Message 

The first object to configure is the message type. 
The message tells the computer that's running SQL 
Server what sort of validation is required for the 
data that's being sent within the message. Message 
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LISTING |: Code to Create the Service 


CREATE SERVICE SampleServiceSource 
AUTHORIZATION dbo 

ON QUEUE SampleQueueSource 

( 


); 
CREATE SERVICE SampleServiceDestination 
AUTHORIZATION dbo 

ON QUEUE SampleQueueDestination 


SampleContract 


SampleContract 


, 


types are created by using the CREATE MESSAGE 
TYPE statement. When you use this statement, you 
tell the message type what kind of validation should 
be performed. Use the following code to create the 


message: 

CREATE MESSAGE TYPE SampleMessageType 
AUTHORIZATION dbo 

VALIDATION=NONE; 


Object Type 2: Contract 


The second object to configure is the Contract type. 
The contract tells the SQL Server instance what 
message types can be used within the conversation. 
You can bind one or more messages within a single 
contract. Contracts are created by using the CREATE 
CONTRACT statement. Only а single contract 
is required. However, you can use more than one 
contract if you want. Multiple contracts can be used 
when multiple applications need to send data into а 
single queue or when there is а need to show the data 
from multiple processes. Use the following code to 


create the contract: 


CREATE CONTRACT SampleContract 
AUTHORIZATION dbo 


( 
SampleMessageType SENT BY ANY 


); 


Object Туре 3: Queue 


The third object to configure is the queue itself. Тһе 
queue is where the messages are stored between the 
time they're sent and the time they're processed by 
the receiving computer. Because the queue will store 
physical objects, you have to define in which file group 
the object is created. You create two queues, one as 
the source and one as the destination. After messages 
are processed, an acknowledgement is sent back to 
the sending queue. Therefore, a separate queue 15 used 
to hold and process these messages. You can use the 


following code to create the queue: 
CREATE QUEUE SampleQueueSource 


CREATE QUEUE SampleQueueDestination 


Object Type 4: Service 

The fourth object to configure is the service. The ser- 
vice binds the queue to the contract for the purposes 
of sending messages. А message is sent to a specific 
service. That service is configured by the queue to 
which the message is delivered and also by the con- 
tracts that can be used to send that message. The 
contract defines which message types are available 
when you send the message. The contract is bound 
to the service. A service, in turn, is bound to a single 
queue, but it can be bound to more than one contract. 
Listing 1 shows the code to create the service. 


Object Type 5: Endpoints 

Endpoints are required for server to server communi- 
cations. Endpoints aren't required for Service Broker 
solutions that don't send messages between instances. 
The syntax for creating an endpoint is pretty straight- 
forward. You specify the IP address, the TCP port, 
how the authentication should be handled, and what 
encryption option should be used when passing data 
between the two instances. You can use the following 
code to create the endpoints: 


CREATE ENDPOINT ServiceBrokerEndpoint 
AUTHORIZATION sa 
STATE-STARTED 
AS TCP (LISTENER PORT=5555, LISTENER IP-ALL) 
FOR SERVICE BROKER (AUTHENTICATION = WINDOWS, 
ENCRYPTION-REQUIRED, ALGORITHM=RC4) 
GO 


In this sample code, the listener port is set to 5555. 
This is the same port that's specified within the route. 
The IP address is set to ALL because we want the 
endpoint to listen on all the server's IP addresses. 
This setting can be limited to а single IP address 
by changing the LISTENER IP address to an IP 
address that the server uses. 

You can specify either Windows Authentica- 
tion or certificate-based authentication. Windows 
Authentication can be used when the instances are 
on the same domain. Certificate-based authentication 
must be used when the machines aren't on the same 
Windows domain. 


Object Type 6: Route 

You can optionally configure а sixth object, а route, 
if the source and target services aren't on the same 
system. Å route enables messages to flow from one 
database to another, regardless of whether these data- 
bases are stored on the same SQL Server instance, on 
different instances on the same server, or on different 
servers. To create a route, each instance that will be 
involved will require you to create an endpoint for 
the specific use of the Service Broker. You create а 
route by using the CREATE ROUTE statement. On 
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the computer that's sending the message (SQLOI, in 
this example), you create a route to the destination 
machine ($01.02, in this example). 

In the following code, Гуе assumed that you've 
created endpoints on both computers, and that these 
endpoints are listening on TCP port 5555: 


CREATE ROUTE SampleRoute 
AUTHORIZATION dbo 
WITH SERVICE NAME = 'SampleServiceDestination', 
BROKER INSTANCE = '1B164171-8061-4C56-AB32- 
8DBBE629DB38' , 
ADDRESS='tcp://SQL@2:5555'; 


You'll have to change the BROKER_INSTANCE 
value to match the BROKER_INSTANCE values 
from the service_broker_guid column of the sys 
.databases system catalog view on the remote server 
to which you're creating the route. 

On the destination server, you must also create a 
route pointing back to the source server: 


CREATE ROUTE SampleRoute 
AUTHORIZATION dbo 
WITH SERVICE NAME = 'SampleServiceSource', 
BROKER INSTANCE = '1B164171-8061-4C56-AB32- 
8DBBE629DB38' , 
ADDRESS='tcp://SQL@1:5555'; 


When you create the routes, the service name of the 
local service is case sensitive regardless of what the 
database collation is set to. This is because the Ser- 
vice Broker does a bit-level comparison between the 
name that’s entered and the name of the service on 
the instance. If the case doesn’t match, the messages 
won't flow across the route. 

The BROKER_INSTANCE setting is an optional 
field. You can leave it blank or you can configure it 
by using the broker instance ID value from the sys 
databases catalog view on the remote computer. 
If database mirroring is used on the computer at 
the other end of the route, you can include the 
MIRROR_ADDRESS parameter to tell the route 
the location of the database mirror. If you don’t 
include the MIRROR_ADDRESS parameter, and 
if the database mirror fails over, the messages don’t 
move to the destination database. This is because 
the route won’t have the information to locate the 
database mirror. 


Creating a Conversation Dialog 

Sending messages doesn’t require much T-SQL code. 
Before you send a message, you must first create 
a conversation dialog to identify the conversation 
because you send the messages from the source to 
the destination. You create the dialog by starting with 
the BEGIN DIALOG statement. After you create the 
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conversation dialog, the dialog is assigned a GUID 
that’s used to differentiate the conversation from all 
other conversations. 

After you create the conversation dialog, you can 
send messages over the conversation by using the 
SEND statement. In the following example, we'll take 
the contents of the sys.objects catalog view and put it 
into the XML document. Listing 2 shows the code to 
create the conversation dialog. 


Processing the Messages 

At this point, the message has been sent and can be 
viewed by querying the queue object directly by using 
the SELECT statement. You can view the data in the 
queue by running 


SELECT * FROM [SampleQueueDestination] 


You can look at the messages in the queue without 
causing them to process by using the SELECT state- 
ment. To process the messages, use the RECEIVE 
statement to obtain the contents of the message and 
to remove the message from the queue. 

You can receive either a single message at a time 
using the code in Listing 3, or you can receive mul- 
tiple messages using the code in Listing 4. After you 
process all messages in a conversation, use the END 


LISTING 2: Code to Create the 
Conversation Dialog 


DECLARE Gconversation handle UNIQUEIDENTIFIER 
BEGIN DIALOG conversation handle 
FROM SERVICE [SampleServiceSource] 

TO SERVICE 'SampleServiceDestination' 

ON CONTRACT [SampleContract] 


DECLARE Gmessage body XML 

SET (message body - (SELECT * FROM sys.objects FOR 
XML AUTO, ROOT('root')); 

SEND ON CONVERSATION Gconversation handle 

MESSAGE TYPE [SampleMessageType] 

(Gnessage body); 


LISTING 3: Code to Process a Single 
Message 


DECLARE Gmessage body XML 
DECLARE Gconversation handle UNIQUEIDENTIFIER 
RECEIVE TOP (1) Gconversation handle = 
conversation handle, 
@пеѕѕаде body = cast(message body AS XML) 
FROM SampleQueueDestination 
END CONVERSATION Gconversation handle; 


LISTING 4: Code to Process Multiple 
Messages 


DECLARE @Messages TABLE 
(conversation handle UNIQUEIDENTIFIER, 
message body varbinary(MAX)) ; 


RECEIVE TOP (100) conversation handle, message body 
FROM SampleQueueDestination 
INTO (Messages; 
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LISTING 5: Code to Create the 
Receiving Queue 


CONVERSATION statement to close the conversa- 
tion so that no other message can be sent on the 
conversation. After the END CONVERSATION 
statement runs and closes the conversation, the con- 
versation can't be reopened. (Note that if there aren't 
any messages in the destination queue, you'll receive 
an error message stating that the conversation handle 
is missing.) After you process the values from the 
(Messages table, close the conversation as required. 

After the END CONVERSATION statement, 
you'll also see a message in the source queue. You 
have to receive this message and perform an END 
CONVERSATION procedure at the source end of the 
original message. This confirms for the Service Broker 
that the conversation is complete and that all the meta- 
data about the conversation can be cleaned up. 

Most typically, you'll want to configure the 
receiving code within a stored procedure. This will let 


p 


Х 


you configure the queue to run the stored procedure 
automatically when messages are received. You do so 
by configuring an activation procedure on the queue. 
After the stored procedure is created, the queue 
should be modified by using the ALTER QUEUE 
statement. Listing 5 shows the code to configure the 
receiving queue. 

When you configure the MAX QUEUE. 
READERS setting, you can configure the SQL 
Server system to run anywhere from 0 to 32,767 
parallel processes. The greater the number of par- 
allel processes that you have running, the faster that 
messages can be processed. However, you should 
run no more than one parallel processes per CPU 
core on the server so that you don't overload the 
CPUs. 


The Power to Build 

When used correctly, the Service Broker can pro- 
vide an easy-to-use, asynchronous message pro- 
cessing platform. It gives you the ability to build 
a wide variety of applications that don't require 
an immediate response to the command that's 
executed. E 
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5515 Novices' Guide to 
Data Warehouses: 


T are two ways that data warehouses 
often get built. Occasionally, someone in 
the database trenches pitches the idea to the 
higher-ups, but more often someone at the top issues 
a directive to build one. Being the recipient of such a 
directive can be frightening, especially if you have to 
quickly build one and you have little or no experience 
with using SQL Server Integration Services (SSIS). 
Fear not—I’ll guide you in building a data ware- 
house, while conveying some of the philosophical 
foundations of data warehousing. However, I won't 
get into the esoteric topics that seasoned business 
intelligence (BI) professionals like to discuss when 
they geek-spar. 

Providing instructions on how SSIS novices can 
build a data warehouse is too big to be covered in 
just one article, so this is the first article in a series. In 
this article, ГЇЇ show you the basic structure of a data 
warehouse and how to make an SSIS package move 
data from the source system into that structure. 


Setting the Stage 
Suppose youre the database developer and DBA for 
a small company, StateUBags, that sells backpacks 
and messenger bags online to students at your state’s 
major university. You built a database that enables an 
online shopping-cart application. Your shopping-cart 
back end is a SQL Server database named SUBTrxn 
(short for StateUBags Transactions). When building 
SUBTrxn, you paid careful attention to getting the 
data into third-normal form. You applied the most 
genius keys and indexes, so SUBTrxn has solid ref- 
erential integrity. You regularly partition and archive 
old sales data to keep the tables small and light. 
Besides being the database developer and DBA, 
you're also the point person for providing reports. 
You hate this responsibility because you're constantly 
bogged down by one-off requests for data sets. You 
often have to resurrect data from backups, which 18 
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Moving Data into the 


Data Warehouse 


How to create 5515 packages the wizardly way 


so cumbersome that report requesters receive their 
data sets long after they would've been useful. You 
have a folder full of SQL code that you tool and 
retool constantly in an effort to speed up your report- 
writing activities. You can't remember how most of 
your stored procedures work, but they seem to work 
well enough. You're constantly tiptoeing around land 
mines in the result sets by explaining the very valid 
reasons why the measures in one report don't match 
the measures in another. You hope no one with any 
SQL expertise ever sees the one stored procedure you 
rely far too heavily on—the one with the nested sub- 
query pulling from a view that pulls from a view. 

Recently, the boss of your boss's boss sent word 
down the command chain that he'd like you to build 
a data warehouse. You're thrilled, yet scared since 
you're not experienced with using SSIS. 

The only requirement you've been given is “Тһе 
suits want to be able to analyze sales, in an ad-hoc 
fashion, over the last five years." Your boss assures 
you that all you have to build is the database portion. 
After you complete it, a contractor will build a SOL 
Server Analysis Services (SSAS) cube on top of it. 
You have only two weeks to build it. 

How do you go about building the database por- 
tion of a data warehouse? ГП walk you through the 
steps using a SQL Server 2008 R2 machine. Before 
you start, though, it helps to have a simplified repre- 
sentation of the current pre-warehouse environment. 
As you grow the current environment into a fully 
operational data warehouse, you'll be able to refer 
back to this representation as the starting point. 

Figure 1 shows the current environment for Sta- 
teUBags. As you can see, data is being exchanged 
between the SUBTrxn database and the website that 
the customers use to view and purchase products, as 
well as view their purchase history. The back end uses 
slow, fragmented SQL scripts and stored procedures 
to output ad-hoc reports. 
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Step I: Add the Databases The data in Stage tables significantly differs from 
The first step is to add databases that will house sets that in the Raw tables in terms of the data’s struc- 
of tables containing raw, staged, and dimensionally ture values. When you move the data from the Raw 
modeled data. ГЇЇ refer to these sets as the Raw tables, tables to the Stage tables, you apply business rules, 
Stage tables, and Dimensional tables, respectively. flatten two or more tables into one, mark records for 

The Raw tables house near-exact copies of the filtering, or perform other activities. In other words, 
source data (in terms of both the data's structure anything you need to do to make the data ready for 
and values) from the transactional or other external the data warehouse, you do during the move from 
system. As a general rule, data warehouse developers Кау to Stage. 
try to make the Raw tables exact copies. Data ware- The Dimensional tables store the data long-term 
house developers should only depart from this rule ina format that's tailor-made for use by SSAS cube 
when there are compelling and defensible reasons to files. Dimensional tables can be fact tables or dimen- 
do so. What constitutes compelling and defensible? sion tables that come together in star, snowflake, and 
The burden of proof rests on the developer who made constellation schemas. 
the decision. So, make your Raw tables as “raw” as For the example used in this article, you need to 
possible—and if you decide to calculate (aka figure), create two new databases—SUBStg and SUBDW— 
fix, filter, flatten, or flag before the data lands in the — as seen in Figure 2. The SUBStg database will con- 
Raw table, be prepared to defend your decision. tain the Raw and Stage tables. Placing the Raw and 
Stage tables in the same database is often practical, 
especially in the early stages of development when 
the warehouse is still small enough for that to be fea- 
sible. As your data warehouse grows, you might need 
separate databases for the Raw and Stage tables. You 
might even want to have these databases on separate 
servers. 

The SUBDW database will contain the Dimen- 
sional tables. These fact and dimension tables will be 
structured expressly for serving up data to be used by 
SSAS cubes. 


Production Server 


Figure 1: 


Documenting the current ad-hoc reporting environment 


Step 2: Check for Software 

Now that you have your databases in place, you 
need to make sure that you have the necessary soft- 
ware. First, you need to confirm that you have SSIS 
installed on your machine. In Control Panel, select 
Administrative Tools, click Services, and look for 
SQL Server Integration Services in the list. You'll be 
using ап SSIS package—specifically, an extraction, 
transformation, and loading (ETL) package—to 
create the tables in the data warehouse. 

Second, you need to make sure that you have 
Business Intelligence Development Studio (BIDS) 
installed. Choose АП Programs on the Start menu, 
click Microsoft SQL Server 2008 R2, and look for 
Business Intelligence Development Studio in the list. 
If you don't find BIDS listed, you'll likely need to 
re-install it from the SQL Server installation DVD. 
You'll be using BIDS to edit the ETL package. 
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Figure 2: 
Creating the databases 
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Figure 3: 


Moving the data from the source database into the Raw tables 
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Two sets Step 3: Create the Raw Package 


The next step is to create the ETL package that will 
move data from the source system (1.е., the SUBTIxn 
database) into the Raw tables, as Figure 3 shows. ГИ 
refer to this ETL package as the Raw package. 

You can use the Import/Export Wizard in SQL 
Server Management Studio (SSMS) to create the Raw 
package. Although this isn't a common practice, it's a 
useful method to know even if you decide not to use 


SQL Server Magazine ° www.sqlmag.com 


it much in the long term. It provides an easy-to-learn, 

reliable path into SSIS development. 

In SSMS, locate your source database (in this case, 
the transactional database) in the Object Explorer 
pane. Right-click that database, select Tasks, and 
choose Export Data to bring up the Import/Export 
Wizard. You've probably used this wizard to move 
data around in your databases, but you might not 
have saved your work as an SSIS package. 

When the wizard appears you might see an 
informational screen. If you do, click Next to pro- 
ceed to the first configuration screen. Following the 
guidelines shown in Table 1, complete the first three 
configuration screens. 

On the fourth configuration screen titled Select 
Source Tables and Views, you'll be presented with 
a list of all the tables in the database you chose as 
your source database. When you select a table's check 
box, the wizard fills in the Destination column with 
a table that has the same name. As Figure 4 shows, 
you can change the existing table's name to the name 
of a table that doesn't yet exist. The wizard will then 
create the new table for you and populate it with data 
from the source table. For this example, you need to 
add the word raw to the beginning of each table name. 
Prepending the Raw tables’ names with raw and the 
Stage tables' names with stg is an easy way to keep 
the two types of tables separate when they're stored 
in the same database. 

Try to select at least 10 tables from your source 
list. Almost any lookup table in your transactional 
database is a good candidate to become a dimen- 
sion table in your warehouse, so choose nine lookup 
tables for now. You'll want to choose at least one table 
that represents a transaction table or an event table. 
Most of the time, transactions and events eventually 
become fact tables. (ГЇЇ be covering how to make these 
decisions in a future article.) 

If you haven't used the Import/Export Wizard 
much in the past, I recommend that you click the 
Edit Mappings button after you're done selecting and 
renaming the tables. It's helpful to know about the 
options that are available to you. 

When you're finished with the Select Source Tables 
and Views screen, click Next to proceed to the Save 
and Run Package screen. Here you need to configure 
some important options: 

* Clear the Run Immediately check box and 
select the Save SSIS Package check box. At 
this point, you only want to save the package. 

e Elect to save the package in the file system 
instead of the default option of SQL Server. 

* Set the Package Protection level to Do not save 
sensitive data. 


Click Next to advance to the Save SSIS Package 
screen. Type in a name for the Raw package and use 
the Browse button to navigate to the location in the 
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file system where you want to save it. (Don't save 
it inside a database.) Click Finish to save the Raw 
package as a .dtsx file. In the screen noting that the 
operation was successful, click Close. 

If you were to double-click the .dtsx file, it would 
open a window that would allow you to run the Raw 
package. However, executing the package at this 
point wouldn't be useful, so you don't need to run 
it. Instead, you need to incorporate this package in 
a project. 


Step 4: Add the Raw Package 
to a Project 
It's now time to create the project that will contain the 
Raw package. Begin by launching BIDS. On the File 
menu, choose New and select Project. Under Project 
Types, choose Business Intelligence Projects and select 
the Integration Services Project template. Name the 
solution as you see fit, and put it in any location that 
works for you. Click OK. 

When your newly created project opens, you'll see 
the layout BIDS uses for SSIS projects. As part of the 


TABLE 1: Guidance for Completing the First Three 
Configuration Screens in the Import/Export Wizard 


Import/Export Wizard Screen Title | Guidelines for Completing the Screen 


Choose a Data Source For Server Name, choose the server on which your 
transactional data resides. For Database, choose the 
database where your transactional data resides. 


Choose a Destination For Server Name, choose the server you'll use for your 


Raw and Stage data. For Database, choose the data- 
base where you plan to place your Raw tables (SUBStg 
in this example). 


Specify Table Copy or Query For this example, it's important to use the default 


option of Copy data from one or more tables or views. 
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Select Source Tables апа Views 
Choose one or more tables and views to copy 
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Figure 4: 


Selecting and renaming the Raw tables 
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automated project setup process, BIDS has created 
a package named Package.dtsx and opened it in the 
main window. There are four tabs across the top of 
this window. You'll be using the Control Flow and 
Data Flow tabs, but not the other two tabs. 

Before you can use the Control Flow and Data 
Flow tabs, though, you need to add the Raw package 
you created to the current project. In Solution 
Explorer, right-click the SSIS Packages folder and 
select Add Existing Package. In the dialog box that 
appears, select File System from the drop-down list 
for the Package location option since you saved the 
package in a file system instead of a database. Click 
the button labeled with an ellipsis (...), locate and 
select the Raw package, click Open, then click OK. 

Note that you should use the Add Existing 
Package option to add a package rather than manu- 
ally moving it into the project folder. If you manually 
move the package, it won't retain the right filename— 
BIDS will append the number I to it. When you use 
the Add Existing Package option and point to the 
file in а location outside of the project folder, BIDS 
makes a copy of it in the project folder without 
renaming it. 

After the package is added to your project, double- 
click it to open it in the main window. Now that this 
package is open, the main window will show details 
about it. First, take а look at the Connections Man- 
agers area, which 15 in lower portion of the window. 
Notice that you have two Connection Managers: 
DestinationConnectionOLEDB and SourceConnec- 
tionOLEDB. Double-click to open each connection 


External to Raw A...ges.dtsx [Design]  Package.dt! 
І Control Flow ІШ Data Flow io] Event Handlers | 


Data Flow Task 
1 


d Preparation SQL 
а r Task 2 


manager and look around. When you have a good 
sense of what properties it contains, click Cancel. 
Now look at the Control Flow tab in the main 
window. ІСІ show the tasks in the package. The 
number of tasks you'll see is based on the number 
of tables you elected to copy in the Import/Export 

Wizard's Select Source Tables and Views screen: 

* [f you chose five or fewer tables, you might 
only have two tasks on your Control Flow tab. 
Following along should still be possible, but if 
you find it overly confusing, you might want to 
start over and choose more tables the second 
time around. 

* [f you selected six or more tables, you'll see 
that the wizard created two tasks for each 
group of five tables. 


As Figure 5 shows, the sample Raw package lists 
four tasks. The first five tables are covered by the first 
two tasks. The second five tables are covered by the 
third and fourth tasks. The two types of tasks are 
Execute SQL Tasks and Data Flow Tasks. 

Although the Import/Export Wizard already 
created the tasks, looking at their configurations is 
helpful, especially if you're new to SSIS. So, double- 
click Preparation SQL Task 1, which is an Execute 
SQL Task, to open it. Execute SQL Tasks are simple 
to set up and run. The most important settings are 
Connection, which you use to pick the connection 
manager you want to run your SQL code on, and 
SQL Statement, which you use to paste your code into 
the SQL Statement box. The connection manager is 
recorded in the Connection property and T-SQL code 
is written to the SQLStatement property. If you hover 
over or open the SQLStatement property for Prepara- 
tion SQL Task 1, you'll see that it simply creates the 
Raw tables. Close the task. 

In the Control Flow tab, right-click Preparation 
SQL Task 1 and select Execute Task. The task will 
turn yellow, then green. When you execute a task 
this way, you're automatically switched to debug- 
ging mode. To return to the design mode, select Stop 
Debugging on the Debug menu. Now, execute the 
task again. This time the task will turn yellow, then 
red. This occurs because those tables were created 
when you ran the task the first time. 

To avoid this problem, you need to change the 
code in both of the Preparation SQL Tasks. Rather 
than dropping and recreating the tables each day, let's 
truncate them. Double-click Preparation SQL Task 1 
to open its properties page. On the General tab, locate 
the SQLStatement property. When you place your 
cursor in the SQL Statement field, a button labeled 
with an ellipsis appears. Click it to launch a larger 
edit window. 

Inthe SQL Statement code, replace each CREATE 


Figure 5: TABLE statement with a TRUNCATE TABLE state- 


Examining the tasks in the Raw package ment. Save the edits, and return to the Control Flow 
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tab. Execute the task again and watch it turn green 
this time. (If it turns red, check the syntax in your 
code). Return to the design mode, and repeat these 
steps for Preparation SQL Task 2. 

When you're done changing the code, double-click 
Data Flow Task 1 in the Control Flow tab. It will 
seem as if you're opening the task, but you'll actually 
move to the Data Flow tab in the main window. The 
Data Flow Task drop-down list near the top of this 
tab lets you move between different Data Flow Tasks 
without having to return to the Control Flow tab. (If 
you have only one Data Flow Task, the drop-down 
list will contain only one entry.) 

Take the time to explore the tasks in the Data Flow 
tab so that you become familiar with how they're con- 
figured. When you're satisfied that you understand 
their inner workings, return to the Control Flow tab, 
right-click Data Flow Task 1, and execute it. Return 
to the Data Flow tab and watch the task run as it 
gives you status information using yellow, red, and 
green indicators, as Figure 6 shows. Another key bit 
of information it provides is a row count, which can 
be helpful in more complex packages. Return to the 
design mode by stopping the debugger. 


Step 5: Test the Raw Package 
It's time to test the entire Raw package. In Solution 
Explorer right-click the package and choose 
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Figure 6: 
Watching Data Flow Task | execute 


would probably build the SSIS package from scratch 
instead of using SSMS's Import/Export Wizard, the 
wizard is a boon for SSIS novices who must build one 
quickly. In the next article, ГИ show you how to use the 
same technique to build the package that will move 
data from the Raw tables into the Stage tables. El 
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Execute Package. (If you aren't given that option, 
make sure that you stopped the debugger.) 

When you execute a package, each task will run 
in succession, turning from yellow (i.e., running) 
to green (successfully completed the task) or red 
(failed to complete the task). When every task turns 
green, shout “Green!” and put two fists in the air. 
This is an essential step in SSIS package building. 

If a task turns red, you can try to fix the 
problem immediately or you can temporarily dis- 
able the task. To do the latter, you need to return 
to the design mode by stopping the debugger, 
right-clicking the task you want to disable, and 
choosing Disable. 

Disabling tasks within the Control Flow tab is 
useful for troubleshooting because it allows you 
to continue testing other parts of the package 
when one or more tasks have outstanding issues 
or incomplete logic. However, disabling a task 
only solves the problem in the short term. You'll 
still need to find and fix the problem later. Note 
that you can't disable Data Flow Tasks, no matter 
what tab you're on. 


Stay Tuned 

So far, you created two databases to house the 
data that will be in the data warehouse. You also 
learned a quick and easy way to build the SSIS 
package that will move data from the source 
system into the Raw tables. Although SSIS experts 
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"Those who сап envision а plausible 
future that's brighter than today will 
earn the opportunity to lead." 

Ray Ozzie October 28, 2010 
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The world has changed. 
NW You need to know about your SQL Servers, whenever, wherever. 
Å You're available 24/7, but your home life matters. It matters a lot. 


-W- You're always carrying an internet-connected device. 


18 months ago, Red Gate started the biggest 
project in its history to create the future of 
SQL Server monitoring... 
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...SQL Monitor is the future of SQL Server monitoring. 


а Global Overview 11:28AM 22 
3nov2010 | A 
менге servers ar $ 127 (om 94227 30 174 
Era 
p> opment (9) ° ° ° ° ° 
» ÉQ Producten tt ол testet re2- gate co Arn m: 5 110 yt 
Єз Test Machines (1: Ы 142008 йн» I: ' ” өз 
Ы 132008 Ян» LES 4 18 
Ы soenress йн» It 03 
ПЫ боса) EIE 12 12 
à lestnatre-gate com аф мезет ш, or 
ПЫ 332005 $ Mesum 12 01 
B 352908 sp Mesum LE ° °л “ 
Ы оси sp Meum LE) өз 
ао Ян» LE 
“ж ° 
әс “ж ° 
È #9200 Arn 2 
Ы бәс аў ueoum DE 3 2 °з 
ы ate йн» | шин ›. 


"SQL Monitor feels like а familiar piece of software within 
minutes of installing and configuring it. You can log in and 
see how your servers are performing from anywhere you 
can get an internet connection." 


Jonathan Allen Senior DBA and Beta tester 


Find out whether we've earned the opportunity to lead: 
www.thefutureofmonitoring.com 
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he idea behind data partitioning is really 

i pretty simple: Rather than storing data in 
one really large table, you spread the data out 

among many smaller tables. When used properly, this 
technique can aid in the maintenance and scalability 
of storing data, and a good partitioning strategy lets 
you increase the efficiency of operations against that 
data. Partitioning lets you spread your data among 
file groups on different drives as well and makes it 
easier to re-index smaller segments of the overall data 
set. However, although the idea is relatively simple, 
historically, the implementation hasn’t been. Before 
SQL Server 2005, partitioning was accomplished by 
spreading data among separate physical tables and 
joining the tables as needed through the use 
of views. This method lent itself to a slew of 
challenges. First, any time a new table was 
added or an old one removed, definitions 
would need to be changed to accommodate 
the underlying table changes. Special logic 


You achieve many administrative and 
maintenance-related benefits by 
partitioning your tables, such as the 
ability to move data in and out of 
tables quickly by “switching” partitions. 
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would have to be added to the system to indicate into 
which tables new records should be inserted. Also, 
queries against partitioned views were typically slow, 
especially when data was distributed among different 
servers. 

The Developer and Enterprise Editions of SOL 
Server 2008 and SQL Server 2005 brought changes 
that made partitioning much easier. Instead of defin- 
ing different tables and hashing them together, as in 


previous versions, SQL Server would let you define a 
single table as well as define how your data would be 
logically and physically distributed through the use 
of schemes and functions. In addition, SQL Server 
would manage the distribution in the background 
so that you didn't have to. Your data would still be 
in smaller tables, as it had been in previous versions, 
but SQL Server would handle those details in the 
background. In this article, my goal is to give you an 
overall high-level view of some of the challenges Гуе 
faced and the differences l've noticed between SQL 
Server 2008 and SQL Server 2005 performance when 
I run the same queries against nonpartitioned and 
partitioned tables. 


Design and Implementation 

I won't cover partitioning design and implementa- 
tion in extensive detail. Just know that you need to 
define a partition function to specify how you want 
your data logically distributed. With that definition, 
you specify your value barriers. You must then cre- 
ate a partition scheme to physically distribute your 
partitions among file groups based on the logical 
barriers you defined in your partition function. When 
you create a clustered index on your table, you 
specify the partition scheme that will be used to par- 
tition the table (alternatively, you could use a unique 
constraint). You achieve many administrative and 
maintenance-related benefits by partitioning your 
tables, such as the ability to move data in and out of 
tables quickly by “switching” partitions. Partitioning 
also allows for rebuilding indexes on one or more 
partitions individually. You can find a highly useful 
white paper about partitioning at msdn.microsoft 
.com/en-us/library/ms190787.aspx. 


Query Performance 

When it comes to querying partitioned data, you 
face performance challenges, although current chal- 
lenges pale in comparison to those encountered with 
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pre-SQL Server 2005 versions. At first glance from 
a high level, it might seem that query performance 
for nonpartitioned and partitioned tables should be 
the same. But because partitioned data is spread out 
among partitions rather than being in the same loca- 
tion, it becomes trickier for the database engine to 
easily locate data. However, if you design your parti- 
tioning strategy so that your queries always point to 
a particular partition, you can see query performance 
benefits similar to those for a nonpartitioned table. 
These performance benefits can grow dramatically if 
you're dealing with huge amounts of partitioned data. 
If a majority of your queries will span partitions (e.g., 
data from different time ranges, different customers), 
then achieving good performance is trickier still. Note 
that if you find that your queries continually span 
partitions, you might need to reevaluate your parti- 
tioning strategy and consider whether this feature is 
the right choice for your environment. 

As I compared SQL Server 2008 and SQL Server 
2005 performance running the same queries against 
nonpartitioned and partitioned tables, I intentionally 
kept the tests as straightforward as possible. This 
approach lets you easily see the differences between 
the different methods. Results might—and probably 
will—vary when higher loads of data are examined 
and when partitioning strategies and query criteria 
change. 

The code shown in Listing 1 sets up the tables 
and indicates how they're partitioned. Гуе designed 
my partitioning function to distribute data across 10 
different partitions, all of which are in my primary file 
group. In a perfect world, I would have these parti- 
tions on different disks or sets of disks, but for this 
scenario, placing all of the partitions in the primary 
file group should be fine. 

In this setup, I load one million records into a 
table named NonPartitionedTable. Each column in 
this table will play a different role in the examples pre- 
sented. The ValueColumn contains a globally unique 
identifier (GUID), which ГЇЇ later use to return a 
single record lookup. The PartitionID is a hashed 
value that will be part of my clustered index for my 
partitioned table and will serve as the partitioning 
key for data distribution. If you partition a table, the 
partition key must be included in the clustered index 
definition because partitioning physically distributes 
a table, and a clustered index is physically a part of 
the table. Note that I'm not designing this partition 
strategy on a date-based element. Instead, I mimic 
an environment in which data is segmented based on 
customer-related data. 

The GroupedValue is another hashed column that 
ГЇЇ use to query values across partitions and return 
an aggregated set of data. This type of query is of 
particular importance because many queries that 
are run against partitioned tables can’t specify the 
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partitioning key as part of the criteria because records 
from multiple partitions must be returned. The per- 
formance of such queries is critical and should play 
a significant role in your design strategy if you deter- 
mine that you need to partition your database tables. 


After the data is loaded into my nonpartitioned 


table, I simply use a SELECT INTO statement to 
make a copy of the table named PartitionedTable. 
From there, I build indexes on each table. When I 
build the clustered index on the PartitionedTable, I 
specify my partitioning scheme. 


LISTING 1: SetupAndDataLoad.sql 


USE tempdb 

GO 

IF OBJECT ID('NonPartitionedTable','u') IS NOT NULL 
DROP TABLE NonPartitionedTable 

go 

IF OBJECT ID('PartitionedTable','u') IS NOT NULL 
DROP TABLE PartitionedTable 

go 


CREATE TABLE NonPartitionedTable 
(6 


IDColumn INT IDENTITY(1,1) NOT NULL, 
ValueColumn VARCHAR(36), 

PartitionID TINYINT, 

GroupedValue TINYINT 


GO 

IF EXISTS(select 1 from sys.partition schemes WHERE name - 
'PartitionScheme SQLServerNation') 

DROP PARTITION SCHEME PartitionScheme SQLServerNation 

GO 

IF EXISTS(select 1 from sys.partition functions WHERE name - 
'PartitionFunction SQLServerNation') 

DROP PARTITION FUNCTION PartitionFunction SQLServerNation 

GO 

CREATE PARTITION FUNCTION PartitionFunction SQLServerNation(TINYINT) 

AS RANGE LEFT FOR VALUES(1,2,3,4,5,6,7,8,9) 

GO 

CREATE PARTITION SCHEME PartitionScheme SQLServerNation 

AS PARTITION PartitionFunction SQLServerNation ALL TO ([Primary]) 

GO 


SET NOCOUNT ON 


DECLARE @x INT 
SET @х = 1 


WHILE @х « 1000000 

BEGIN 
INSERT INTO NonPartitionedTable(ValueColumn, PartitionID, GroupedValue) 
SELECT CAST(NEWID() AS VARCHAR(5Ø)), @x%10, CAST((@x*RAND()*145682) AS 
ВІСІМТ)%10 


SET @x = @X + 1 
END 


БЕНЕТ 
INTO PartitionedTable 
FROM NonPartitionedTable 


CREATE UNIQUE CLUSTERED INDEX сах NonPartitionedTable on 
NonPartitionedTable(IDColumn, PartitionID) 

CREATE UNIQUE CLUSTERED INDEX cdx PartitionedTable on 
PartitionedTable(IDColumn, PartitionID) 

ON PartitionScheme SQLServerNation(PartitionID) 


CREATE NONCLUSTERED INDEX idx NonPartitionedTable ValueColumn on 
NonPartitionedTable(ValueColumn) 

CREATE NONCLUSTERED INDEX idx PartitionedTable ValueColumn on 
PartitionedTable(ValueColumn, PartitionID) 

ON PartitionScheme SQLServerNation(PartitionID) 


CREATE NONCLUSTERED INDEX idx NonPartitionedTable GroupedValue on 
NonPartitionedTable(GroupedValue) 

CREATE NONCLUSTERED INDEX idx PartitionedTable GroupedValue on 
dbo.PartitionedTable(GroupedValue, PartitionID) 
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DECLARE @ValueColumn VARCHAR(36), @PartitionID TINYINT 


SELECT TOP 1 @ValueColumn = ValueColumn, @PartitionID PartitionID 
FROM dbo.NonPartitionedTable 


SELECT ValueColumn 
FROM dbo.NonPartitionedTable 
WHERE 

ValueColumn = @ValueColumn 


SELECT ValueColumn 
FROM dbo. PartitionedTable 
WHERE 

ValueColumn = @ValueColumn 


< 


=] Вези | Bh Messages| å Execution plan 


Query 1: Query cost (relative to the batch): 85 
SELECT TOP 1 BValueColumn = ValueColumn, @PartitionID = PartitionID FROM dbo.NonPartitionedTable 


= iot 


Index Scan 
INonPartitionedTable]. [idx NonParti.. 
Cost: 100 $ 


Query 2: Query cost (relative to the batch): 8*5 
SELECT ValueColumn FROM dbo.NonPartitionedTable WHERE ValueColumn = BValueColumn 


E 4 
SELECT Index Seek 


вые OG [NonPartitionedTable].[idx NonParti.. 
5 Cost: 100 $ 


Query 3: Query cost (relative со the batch): 84% 
SELECT ValueColumn FROM dbo.PartitionedTable WHERE ValueColumn = @ValueColumn 


= fel 1] 
| 
SELECT ae EG Constant Scan 

Cost: 0% sedi Cost: 0% 

Index Seek 
[PartitionedTable]. [idx Partiticned. 

Cost: 18% 

Figure I 


Retrieving a single value from NonPartitionedTable 


| E] Resuts | С Messages r Execution plan 


Table 'NonPartitionedTable'. Scan count 1, logical reads 3, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Table 'PartitionedTable'. Scan count 10, logical reads 27, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Figure 2 
The output from SET STATISTICS IO ON in SQL Server 2005 


DECLARE BValueColumm VARCHAR(36), BParcirionID TINYINT 


SELECT TOP 1 BValueColumm = ValueColusn, @ParticionID = ParritionID 
FRON dbo.NonPartitionedTable 


SELECT ValueColumn 
FROM dbo.NonPartitionedTable 
WHERE 

ValueColumn = fValueColumnn 


SELECT ValueColumn 

FROM dbo.PartitionedTable 

VHERE 
ValueColumn = üValueColumn 23 
PartitionID = Ф?агсісісвір 


< 


T — UI 
C Red! 23 Messages å  Executon plan 

Query 1: Query cost (relative to the batch): 33% 

SELECT ТОР 1 @ValueColumn = ValueColumn, 8ParririonID = ParririonID FROM dbo.NonParririonedTable 


a m ko 
то) Index Scan 
^ s" melk ie mn 


Cort: 100 x 


Query 2: Query cost [relative to the batch); 33% 
SELECT ValueColuen FROM doo.NonPattitionedTable WHERE ValueColusn = BValueColumn 


Index Seek 
[lionParcitionedfable].[idx HonParci. 
Соз: 100% 


Query 3: Query cost (relative to the batch): 33% 
SELECT ValueColumn FROM dbo.PartitionedTable WHERE ValueColumn = üValueColumn AND PartitionID = @Рагс1с1оп1Р 


= ^ 
Index Sedi 
mer — HOC Partit imne- 
spa ЕҢ Cort: 100 4 
Figure 3 


Performance before and after adding PartitionID to the query criteria 
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After the data 15 loaded, I can start digging in to 
see how queries against nonpartitioned and parti- 
tioned tables differ. ГЇЇ run a series of queries against 
a nonpartitioned table and a partitioned table in 
a SQL Server 2005 instance, followed by the same 
set of queries against a nonpartitioned table and a 
partitioned table in a SQL Server 2008 instance. ГЇЇ 
compare and contrast the execution plans as well as 
the IO activity between the two environments. 

As Figure 1 shows, I retrieve a single value from 
the NonPartitionedTable table and assign it to a vari- 
able. I then specify this variable as the predicate value 
in queries against my two tables. 

You can see that the query against the Non- 
PartitionedTable yields execution plans drastically 
different from the same statement against the 
PartitionedTable. The NonPartitionedTable query 
produces an expected single Index Seek operation, 
whereas the query against the PartitionedTable 
produces an Index Seek, a Constant Scan, and an 
expensive Nested Loop join to compile the result 
set. The Constant Scan, although showing a cost 
of 0 percent, is really a scan of all of the available 
partitions to determine if a particular partition can 
help satisfy the query. This simple query against a 
partitioned table can be quite expensive. 

In SQL Server 2005, if a partitioned table has only 
one partition involved in a query, all available schedul- 
ers could be used to satisfy the query. However, if the 
query spanned multiple partitions, only one thread 
would be used per partition, regardless of how many 
schedulers the server had available to contribute to a 
parallel execution plan. SQL Server 2008 remedies 
this limitation by allowing all available threads to 
access partitions as necessary. This change is more 
of an advantage in OLAP environments, where read 
requests are intensive and can really benefit from 
parallel queries, than in OLTP environments, where 
queries designed to take advantage of parallelism 
might introduce a performance bottleneck and should 
be evaluated. 

Before I continue, ГЇЇ introduce the command ГЇЇ 
use to show the IO activity that results from a query. 
The command is SET STATISTICS IO ON, and it 
must be run in the same session as the queries that you 
run. This command tells SQL Server to display disk 
activity generated by the T-SQL statements executed 
in the current session. When I look at the output from 
this statement, I'm usually most concerned with the 
logical reads figure. This figure represents the number 
of pages read from the data cache. It should remain 
relatively the same for the same query given the same 
data set. As a general rule, when I'm tuning queries, 
I try to get the number of logical reads as low as 
possible. 

Figure 2 shows the output from SET STATISTICS 
IO ON for the query shown in Figure 1. You can see 
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that the query against the NonPartitionedTable table 
incurred three logical reads, whereas the query against 
PartitionedTable table incurred 27. From an IO point 
of view, the query against the PartitionedTable table 
is nine times as expensive as the query against the 
NonPartitionedTable table. 

When I add the partitioning key (PartitionID) 
to the query criteria, the performance against the 
PartitionedTable table improves. In fact, the result 
of the addition yields a performance improvement 
that mimics the performance of the query against 
the NonPartitionedTable table. This improve- 
ment occurs because SQL Server doesn't incur 
the cost of determining which partition holds the 
record in query. Figure 3 shows the performance 
difference. 

As Figure 4 shows, in terms of IO, the two queries 
are now identical because SQL Server no longer has 
to scan partitions. Both tables now incur the same 
number of index seeks, which results in the same 
number of pages accessed. 

Now, let's examine the results of running the same 
set of queries against my SQL Server 2008 instance. 
As Figure 5 shows, the execution plan for SQL Server 
2008 is drastically different from the execution plan 
for SQL Server 2005. The Constant Scan and the 
Nested Loop join operator are gone. The visual rep- 
resentation of the execution plan for the two tables is 
now the same. 

However, the queries still incur the same number 
of reads from the database as they did in SQL Server 
2005. Even though the visual Constant Scan is gone, 
from an IO perspective, the query against the Par- 
titionedTable table in SQL Server 2008 remains the 
same. The relative costs of the queries are the same 
as with SQL Server 2005. 

Figure 6 shows that when I run the same query 
against the SQL Server 2008 instance that I ran 
against the SQL Server 2005 instance (refer to Fig- 
ure 2), I get the same result: a 9 to 1 ratio in terms 
of IO. 

After the PartitionID is added to the query cri- 
teria, as Figure 7 shows, the performance of the two 
queries is the same in SQL Server 2008. This change 
in behavior mimics that of the SQL Server 2005 
instance when the PartitionID column was added. 

In Figure 8, you can see that after I add the 
PartitionID criteria, the logical reads for the two 
queries are the same. Again, this equality occurs 
because SQL Server doesn't have to determine 
which partition to query on its own. I've specified 
the partition by using the PartitionID in the query 
predicate. 

You can see the significance of telling the query 
which partition to access to retrieve the data. If I 
don't specify the partition, SQL Server has to deter- 
mine it, which can be costly. 
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Table 'NonPartitionedTable'. Scan count 1, logical reads 3, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Table 'PartitionedTable'. Scan count 1, logical reads 3, physical reads 0, 
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Figure 4 


Identical queries in terms of IO 


DECLARE @ValueColumn VARCHAR|36), ÜPartitionID TINYINT 


SELECT TOP 1 @ValueColumn = ValueColumn, ØPartitionID = PartitionID 
FROM dbo.NonPartitionedTable 


SELECT ValueColumn 
FRON dbo.NonPartitionedTable 
WHERE 

ValueColumn = üValueColumn 


SELECT ValueColumn 
FROM dbo.PartitionedTable 
WHERE 

ValueColum = üValueColumn 


‹ 


- — 
ТІ Results 1) Messages å “Esecukon plan 
Query å: Query cost (relative to the batch): 0% 


Indes ean (NonClustered] 
IWonPartiticnedTable]. [idx NonParti 
Cost: 100% 


Query 2: Query cost (relative to the batch): Ө% 
SELECT ValueColusn FROM dbo.MonPartitionedTable WHERE ValueColumn = @ValueColumn 


г | 4 
Index Seek (Нотс lustered) 
опат iti опе able). (ide NonParti 
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SELECT ТОР 1 BValueColumn = ValueColumn, ÑPartitionID = PartitionID FROM dbo.NonPartitionedTable 


Query 3: Query cost (relative to the batch); 83% 
SELECT ValueColumn FROM dbo.PartitionedTable VHEPE ValueColusn = @Valueĉolumn 


= ^ 
—— 
[ParcicionedTable] (id: Partitioned 


Figure 5 


Comparing the SQL Server 2008 execution plan to the SQL Server 
2005 execution plan 


| E Нешіз| 2) Messages |F" Execution plan| 


Table 'NonPartitionedTable'. Scan count l, logical reads 3, physical reads 0, 
read-shead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Table 'PartitionedTable'. Scan count 10, logical reads 27, physical reads 0, 
read-shead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Figure 6 
IO ratio after running query against SOL Server 2008 


DECLARE gValueColuma VANCHAN(S6), BPartitionID TINYINT 


SELECT TOP 1 @ValueColumm = ValueColumn, @PartitionID = PartitionID 
FROM doo.NonPartitionedTable 


SELECT ValueColuxe 
FROM doo.NonPartitionedTable 
HERE 

Valuecolum = QVelueColume 


SELECT ValueColuwm 
FROM do.PartitiosedTable 
wurre | 
ValueColuma = üValueColuma AND 
PartitionID = BPArtitionID 


k 
= —I 
T Ranas | 5 ногом Т Eroson sin 


Query 1: Query cost (relative to the batch]: 99% 
SELECT TOP 1 #ValueColuma = ValueColumm, lPartitionID + PartitionID FROM dbo.NonPartitionedTable 
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Query 2: Query cost (relative to the batch): 33% 
SELECT ValueColuma FROM dbo.NonPartitionedTable VHERE ValueColum: = OValueColunn 


P —— 
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Фа 109 $ 
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Query 2: Query cost (relative to the batch); 22% 
SELECT ValueColum FROM dbo.PartitionedTable VHERE ValueColumn = üValueColumn AMD PartitionfD = @PartitionID 


indes Zeek (Mure lustered) 
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Cem 100 $ 


Figure 7 
Query performance after adding PartitionID in SOL Server 2008 
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lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Table 'PartitionedTable'. Scan count 1, logical reads 3, physical reads 0, read-shead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Figure 8 


Logical reads after adding PartitionID to SQL Server 2008 instance 


SELECT GroupCount = COUNT(IDColumm 
FROM NonPart it ionedTable 
WHERE 

GroupedValue = $ 


SELECT GroupCount = COUNT|IDColumm 
FROM PartitionedTable 
VHERE 

Groupedvalue = $ 


< 


Query 1: Query cost (relative to the batch): 28% 
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Query 2: Query cost (relative to the batch): 72% 


SELECT COUNT([IDColumn]) [GroupCount) FROM [PartitionedTable] WHERE [OroupedValue] *81 
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Execution plan for aggregated query run against SOL Server 2005 


== 
ШІ eske | Lig Messages 7 Execution plan 


Table 'HonPartitionedTable'. Scan count 1, logical reads 152, physical reads 0, read-ahead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Table 'PartitionedTable'. Scan count 10, logical reads 171, physical reads 0, read-ahead reads 0, 
leb logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Figure 10 
IO output from the aggregated query run against SOL Server 2005 


SELECT GroupCount = COUNT|IDCOlumn 
FROM NonPartitionedTable 
VHERE 


GroupedValue - 5 


SELECT GroupCount = COUNT|(IDColumn 
FROM PartitionedTable 
WHERE 

GroupedValue = 5 


‹ 
L] Reus | 23 Messages I Execution plan 


Query 1: Query cost (relative to the batch): 48% 
SELECT COUNT([IDColumn]) [GroupCount) FROM [NonPartitionedTable) WHERE (OroupedValue]*81 
м 
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Query 2: Query cost (relative to the batch): 52% 
SELECT COUNT([IDColumn]) [GroupCount] FROM [PartitionedTable] WHERE [GroupedValue]*8i 
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Figure 11 
Aggregation queries гип against SQL Server 2008 


| БІ Results] Lia Messages |27 Execution plan| 


Table 'NonPartitionedTable'. Scan count 1, logical reads 151, physical reads 0, read-ahead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Table 'PartitionedTable'. Scan count 10, logical reads 174, physical reads 0, read-shead reads 0, 
lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 


Figure 12 
SQL Server 2008 STATISTICS IO results 


January 2011 


Aggregations are common types of queries exe- 
cuted against databases. Questions such as “How many 
cars did we sell each month for the past 12 months?" 
or “What is the average number of calls taken by active 
customer service representatives (CSRs) by day this 
month?" call for aggregations. When you deal with 
partitioned tables, you're almost certainly going to be 
running queries that span multiple partitions. Span- 
ning partitions to pull data together requires extra 
overhead for the database engine. 

In Figure 9, you can see the execution plan for an 
aggregated query run against my SQL Server 2005 
instance. Notice that Constant Scan comes into play 
for the SQL Server 2005 query. 

Figure 10 shows the IO output for the query in 
Figure 9. As expected, the query against the parti- 
tioned table still incurs more logical reads than the 
same query against the nonpartitioned table. Figure 11 
shows the same set of aggregation queries run against 
the SQL Server 2008 instance. 

The execution plan indicates that for SQL Server 
2008 the relative cost of the aggregation against the 
PartitionedTable table is still more expensive than the 
same query against the NonPartitionedTable table. 
However, although the difference is still present, in 
terms of relative cost it has improved over the SOL 
Server 2005 version. 

Figure 12 shows the difference in IO between the 
two queries for SQL Server 2008. Again, in terms of 
logical reads, the query against the PartitionedTable 
is still more expensive. 

Partitioning is а powerful technique, one that 
brings advantages to data manageability and admin- 
istration. However, partitioning can bring certain 
challenges when it comes to query performance, and 
you must consider its implementation carefully for 
your environment. 


Partitioning Query 
Performance 
In this article, Гуе explored how the same queries 
run against partitioned and nonpartitioned tables 
on SQL Server 2008 and SQL Server 2005 perform. 
Queries run against a partitioned table in SQL Server 
2005 can perform differently from those run against 
a partitioned table in SQL Server 2008 in terms of 
parallel plan generation. Most significantly, queries 
against a partitioned table perform significantly 
better when the partitioning key is supplied. As 
mentioned earlier, my goal was a high-level view of 
how partitioning tables might affect your query per- 
formance on SQL Server 2008 and SQL Server 2005. 
However, it's clearly not an exhaustive view. If you've 
had different experiences from those I've presented, 
Td love to hear about them—email me at chapman 
.tim@gmail.com. EPI 
InstantDoc ID 128782 
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Sizing Up Data Modeling 


Software 


The best fit depends on your job duties and 


budget 


D” modeling software can help you get a pic- 
ture of what your database environment looks 
Ше апа what it contains. It can also help you design 
databases and generate code. Despite the sometimes 
hefty price tags of data modeling packages, they're 
true productivity tools because they significantly 
shorten the time it takes to perform tasks. Tasks that 
took weeks take only days, and tasks that took days 
take only hours. 

Data modeling software typically requires a long 
learning curve because of the complex capabilities 
built into it. But a good data modeling software pack- 
age should have a relatively straightforward "getting 
started" path. It should also provide most, if not all, 
of the following features: 

e Ability to capture and define business terms (1.е., 
a glossary) 

* Ability to capture and document business 
requirements 

* Ability to capture and document business 
processes 

е Support for multiple database platforms (not just 
SQL Server) 

е Support for two levels (minimum) or three levels 
(optimum) of data modeling (i.e., conceptual, 
logical, and physical) 

* Error checking on all model levels, based on 
industry standards for modeling best practices 

* Ability to modify best practices to fit individual 
corporate standards 

* Ability to customize and modify model tem- 
plates to fit corporate presentation standards 

* Ability to customize the file management system 
to fit corporate or personal needs 

* Ability to reverse engineer, document, and inven- 
tory existing databases 

* Ability to verify a model by comparing it to 
stored model standards, prior to generating SQL 
Data Definition Language (DDL) code 

* Ability to forward engineer a database (1.е., 
create a new database or refactor an existing 
one), either by generating SQL DDL code or by 
creating a live database using ANSI-standard 
SQL code or the appropriate SQL dialect for the 
target database platform (user's choice). 
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* Ability to customize the structure of generated 
DDL code, such as defining keys and con- 
straints that are internal or external to a table 
definition 

* Ability to define and code file groups, partitions, 
users, roles, and other infrastructure artifacts 

* Ability to define and code tables, columns, 
defaults, constraints, and other database artifacts 

* Ability to define and code indexes, stored proce- 
dures, triggers, user-defined functions (UDFs), 
and other code-based artifacts 

e Ability to compare database models at the physi- 
cal level to map the differences between them 

e Ability to share models in a team environment 

° Ability to generate reports in multiple formats 
for publication and sharing 

е Documentation in different types of media (e.g., 
print, online, video) 


Using these features as criteria, I recently 

tested six popular data modeling packages: 

* Quest Software’s Toad Data Modeler for 
SQL Server 3.5, Xpert Edition 

• Altova's DatabaseSpy 2011, Enterprise Edition 

е Datanamic Solutions’ DeZign for Databases 6.2, 
Expert Edition 

е Embarcadero Technologies’ ER/Studio Enter- 
prise 8.5 

* CA's ERwin Data Modeler (DM) 7.3 

e Sybase's PowerDesigner Studio Enterprise 15.2 


Most of the data modeling packages I evaluated 
have the core features I just mentioned. If a product 
doesn't have a feature or if it has a special extended 
feature, I note it in the individual reviews that follow. 
I also created a chart that compares the six products’ 
features. Table 1 contains an excerpt from that chart. 
You can see the chart in its entirety by going to www 
.sqlmag.com, entering 128921 in the InstantDoc ID 
text box, clicking Go, then clicking the Table 1 link. 

Note that each vendor has its own nomenclature 
when referring to the components in their software. 
To simplify the product evaluations, I use generic 
terms (e.g., tree view, workspace) in the individual 
reviews and the product comparison chart. 


Michelle A. 
Poolet 


(MAPoolet @ MountVernonDataSystems.com) 
is a contributing editor for SQL Server 
Magazine and cofounder of Mount Vernon 
Data Systems and Six Sigma Uptime. She 
teaches database design and programming, 
and is a Zachman Certified™ Enterprise 
Architect. 
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See Table | in its entirety 
at InstantDoc ID 128921. 
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TABLE 1: Excerpt from Product Comparison Chart 


Package Name 


DatabaseSpy 
2011, Enterprise 
Edition 


DeZign for 
Databases 6.2, 
Expert Edition 


ER/Studio 
Enterprise 8.5 


ERwin Data 
Modeler 7.3 


PowerDesigner 
Studio Enterprise 
ПІЗ 2) 


Toad Data 
Modeler for SQL 
Server 3.5, Xpert 
Edition 
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Ability to Ability to Design Ability to Reverse — Ability to Ability to Forward Ability to Com- 
Capture and а New or Existing ^ Engineer, Modify Built-In Engineer by pare Database 
Define Business Database ina Document, and Standards of Creating ANSI- Models and Map 
Requirements Graphical Tool Inventory Existing Practice Standard or the Differences 
Databases Platform-Specific Between Them 
SQL Code 
Existing data- Yes T-SQL only Yes, but 
bases only rudimentary 
Yes Yes Yes Yes Yes 
Yes Yes Yes Yes Yes Yes 
Need ERwin Yes Yes Yes Yes 
Process Modeler 
Yes Yes Yes Yes Yes Yes 
Yes Yes Yes Yes Yes 


Toad Data Modeler for SQL 
Server 3.5, Xpert Edition 

Installing Toad was incredibly easy and straightfor- 
ward. However, the Toad installation dumps every- 
thing on the C drive. It would be better if the user 
could decide on the default file location. 

Toad uses the conventional data modeling layout— 
that is, a tree view on the left, a message area at the 
bottom, and a workspace that takes up most of the 
screen’s real estate. In Toad, the tree view contains 
multiple tabs, with each tab containing a project that 
you’ve brought into your workspace, plus the various 
artifacts that belong to the project (e.g., file groups, 
partition schemes, entities and attributes, relationships, 
views). The tabbed workspace is where the data model 
is displayed and edited. The message area contains 
two small windows: Application View and Message 
Explorer. Application View is a brief replicate of the 
tree view. It lets you quickly switch from one view of the 
model to another by means of a tab. Message Explorer 
is the log file. It displays errors, warnings, and any other 
messages that Toad needs to relay to the user. 

Toad has some neat features, such as a place for 
notes in the tree view. You can literally build a punch 
list of things to do, which is necessary because of 


the numerous steps involved when creating a new 
database or refactoring an old one. Toad has a lot 
of right-click menus that are context-sensitive. Those 
menus and keyboard shortcuts make repetitive 
actions fast and easy. 

Toad doesn’t have a centralized repository for 
team development, but it does have version control. 
You need to be in Expert Mode to use versioning, 
and only projects can take part in the version control 
system. Shifting Toad into Expert Mode also lets you 
write and edit scripts, customize package features, and 
do other advanced tasks. 

One productivity feature I really like is the rollover 
highlighting of primary-to-foreign-key relationships. 
When you place the cursor on a relationship line, 
the parent and child columns in the related tables 
are highlighted—no guessing or searching needed. 
Managing the model in the workspace is easy and 
straightforward. For example, getting relationship 
lines untangled is as simple as right-clicking, select- 
ing Straighten All Lines, right-clicking again, and 
choosing Change Lines to Right Angle. No data 
modeling package that I know of produces a compact 
auto-layout, but Toad’s screen management makes 
re-organizing and re-orienting artifacts easy. 
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Color-coding of the keys is another feature that I 
really like. A red key indicates a primary identifier/key. 
(There's no problem with identity properties, as the 
property window is fully compliant with SQL Server.) 
A green key indicates a foreign key. 

Toad isn't perfect, though: 

* File placement and path control are a struggle, 
especially once you're under version control. 

If you deviate from the default location, file 
management is nearly impossible, even after 
modifying the Paths entries in the Options page 
on the Settings menu. 

* Placing relationship lines can be troublesome 
when creating a model. For clarity, you need to 
have absolute control over where the relation- 
ship lines are positioned in the model, and Toad 
doesn't offer that. 

* The Help files aren't inclusive. While working 
with the model, I encountered unfamiliar terms, 
but I couldn't find any explanation of those 
terms in the local Help files. 

* Right-clicking to locate the artifact in the tree 
view doesn't always locate it in the workspace 
view. When you're dealing with a data model 
of any size, being able to quickly find a specific 
object in the model view is a necessity and not 
just a "nice to have" feature. 

* The print facility seemed to be crippled. I 
couldn't print in landscape mode. 


Toad is available in the Professional, Xpert, and 
Development Suite editions. It's also offered as a 
freeware package, which is limited to a maximum of 
five users within an organization. 


TOAD DATA 
MODELER FOR SQL 
SERVER 3.5, XPERT 
EDITION 


Pros: Intuitive to use; designed to 
appeal to developers and DBAs; version control 
for managing models 


Cons: No way to map to requirements; no 
support for conceptual data modeling; file 
placement can be a struggle 


Rating: Jolie 
Price: $1,045 


Recommendation: This fun-to-use tool will 
keep DBAs and developers on their toes and 
help them do more with less. It's a good entry- 
level tool for aspiring data architects. 


Contact: Quest Software е 949-754-8000 or 
800-306-9329 * www.quest.com 


SQL Server Magazine * www.sqlmag.com 


DATA MODELING SOFTWARE 


DatabaseSpy 2011, Enterprise 
Edition 

Installing DatabaseSpy was straightforward, but I 
had to connect to a live SQL Server machine as part 
of the startup process. For this review, I was con- 
necting to a remote server with a SQL Server logon. 
Iwas prompted for both local and remote credentials, 
which got a little confusing until I figured out what 
DatabaseSpy wanted. 

DatabaseSpy suffers from an identity crisis. It's 
like SQL Server Management Studio's (SSMS's) 
Database Diagrams tool (aka the SOL Diagram- 
mer) combined with Red Gate Softwares SQL 
Compare and SQL Data Compare, plus a report 
builder. In DatabaseSpy's UI, there are various tree 
views that resemble SSMS and Visual Studio. The 
workspace, which is used for editing the database 
tables, resembles the SQL Diagrammer. There's a 
separate Overview window, which lets you pan the 
workspace, and a query window, which you can 
use to query live data. Like the SOL Diagrammer, 
youre connected to a live database, so changes 
made to the schema are immediately reflected in 
reality. 

The entire DatabaseSpy model is loaded into 
memory, which results in speedy editing and display. 
However, this can be problematic if you have large 
databases (large in the sense of having many objects 
in the schema) or a client computer with limited 
memory. While the load wizard is working, you can't 
access the Help files or any other DatabaseSpy feature 
because the load wizard is modal. 

DatabaseSpy supports only the physical model, so 
you can only edit existing objects or add new objects 
to the database that you're connected to. Because 
you're connected to a live database, you can inadver- 
tently modify tables and columns and cause chaos 
(or worse) in a development or production system. I 
found other problems as well: 

* The metadata isn't readily revealed, and there's 
no lineage tracing capability. 

e You can’t create new databases. 

° [n the workspace, scrolling and object resizing is 
neither efficient nor effective. 

* The software package itself was fairly unstable. 

It appeared to hang multiple times, even 

though I was using the 64-bit version of the 

product, which should have enhanced memory 
management. 

* Aborting the Help file aborted the entire soft- 
ware package, which resulted in a loss of work. 


DatabaseSpy comes in two editions: Enterprise 
and Professional. Both are available in 64-bit versions 
at no extra cost. You can purchase DatabaseSpy sepa- 
rately or as part of a package called the MissionKit 
2011 for Software Architects. 
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DATABASESPY 2011, 
ENTERPRISE EDITION 


Pros: SQL Diagrammer-style layout; in-memory modeling provides 
speedy editing and display 


Cons: Must be connected to a live database; in-memory modeling limits size 
of database; supports only the physical model; no way to create new data- 


bases 


Rating: LARRA 


Price: Starts at $189 (if purchased separately) or $1,739 (if purchased as 
part of MissionKit 2011 for Software Architects) 


Recommendation: It's SQL Diagrammer on steroids but not a true data 


modeling tool. 


Contact: Altova е 978-816-1600 or 43-1-545-5155 ® www.altova.com 
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DeZign for Databases 6.2, 
Expert Edition 
Installing DeZign for Databases was very easy and 
seamless. After you plug in the license code and key, 
the DeZign application opens with a sample entity 
relationship diagram (ERD) in the workspace, giving 
the experienced user a good first impression and the 
new user a sense of place and purpose. DeZign is set 
up in the conventional data modeling layout. 

I was struck by the high comfort level I encoun- 
tered with DeZign. The visual query builder for 
views makes creating views a snap, and the ability to 
set object naming standards (and other options) by 
project is wonderful. Generated code is arranged in 
correct hierarchy order. For example, the code deletes 
any constraints before deleting the tables to which 
they apply. 

There's also a high level of ease in managing 
models. For instance, there's a function that lets you 
search for and replace data types across a model, 
which makes data-type management a snap. And a 
drag-screen feature makes it easy to move around in 
a large model. 

Switching platforms (e.g., SQL Server to Oracle) is 
very easy, but it comes with a caveat: The code that's 
going to be generated will be in the dialect of the 
original platform. DeZign has a few other drawbacks 
as well: 

e It lacks Fl context-sensitive Help support, and 
the existing Help options are too limited. DeZign 
could use more extensive Help menus and video 
tutorials. 

* There's no format painter, so you can't copy and 
paste formats from one artifact to another. 

* During model validation, the schema checker 
failed to note data inconsistencies between arti- 
facts, which subsequently caused the execution 
of a generated script to fail. 


* Youcan' create a new folder when relocating 
data files. 

* There's no function (F) key or menu option for 
showing the entire model, which is essential for 
navigating very large models. 

* You can't set standards for the enterprise, such 
as the seed and increment values for an identity 


property. 


DeZign comes in three editions: Standard, 
Professional, and Expert. The Expert Edition is 
the version for both data architects and enterprise 
architects. (Want to know how these two positions 
differ? See the web-exclusive sidebar “What's the 
Difference Between a Data Architect and an Enter- 
prise Architect?” www.sqlmag.com, InstantDoc ID 
129141.) You can purchase DeZign separately or 
as part of a package called the DBTools Bundle 
Expert Edition. 


DEZIGN FOR 
DATABASES 6.2, 
EXPERT EDITION 


Pros: Extremely easy to use and man- 
age, which translates into high productivity 


Cons: No way to modify built-in standards 
for model or reports; no way to set enterprise 
defaults; not enough choices on how to gener- 
ate code for supertype-subtype entities 


Rating: 


Price: Starts at $799 (if purchased separately) 
or $899 (if purchased as part of DBTools Bun- 
dle Expert Edition) 


Recommendation: if you're already comfort- 
able with data modeling software, if you have 
your own method of collecting and documenting 
requirements, and if you don't need to custom- 
ize standards, then this is a best buy. 


Contact: Datanamic Solutions e 
31-0-71-5140901 * www.datanamic.com 


ER/Studio Enterprise 8.5 

ER/Studio Enterprise is a big download file, so make 
sure you have a lot of room on your local disk before 
starting the process. During the install, you have to 
set up a user account on the Embarcadero Technolo- 
gies website. 

ER/Studio has so much flexibility and so many 
features that it'll take years for anyone to grow into it. 
Data architects who need to maintain strong standards 
will love the domain control built into it. When you 
create an entity, you can choose from a prepopulated 
data dictionary of domains instead of creating every 
attribute from scratch. The data dictionary contains 
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standard, re-useable attributes similar to user-defined 
data types, which have the same data type/length, 
definition/meaning, rules, and constraints, no matter 
where they're used in the model. 

The software uses the conventional data modeling 
layout. There are multiple ways to navigate large mod- 
els. The Overview screen gives you a birds-eye view 
of the entire model. The Pan feature lets you pan the 
model directly. The Zoom window enlarges whichever 
artifact your cursor is passing over. 

ER/Studio generates both Data Manipulation 
Language (DML) code, which creates the database 
schema, and DDL code for selects, inserts, updates 
and deletes, with triggers to handle updates and 
deletes of a parent with children and children without 
a parent. ER/Studio produces the cleanest generated 
SQL code of all the products I reviewed. In my test, 
the DDL code executed without a single change 
required. 

There are a few things about ER/Studio that bug 
me, but not many: 

* Many-to-many relationships at the logical level 
are treated as unresolvable, which isn't a true 
statement. When you migrate a logical model to 


Data modeling software 
typically requires a long 
learning curve because of 
the complex capabilities 
built into it. 


its physical counterpart in any relational data- 
base, the many-to-many relationship resolves to a 
join table (also called an intersection entity) with 
foreign key attributes inherited from the two par- 
ents. Therefore, the unresolvable assumption that 
ER/Studio makes at the logical level is erroneous. 

* [couldn't connect to a remote SQL Server 
machine until I had it registered with my local 
copy of SSMS. This might be problematic for 
shops in which data modelers aren't allowed or 
choose not to use SSMS. 

* The Zoom In and Zoom Out options don't work 
correctly. I had to use the Zoom window to zoom 
in on artifacts. 


ER/Studio Enterprise is a productivity suite that 
contains five ER/Studio software packages: Data 
Architect, Business Architect, Software Architect, 
Portal, and Repository. Data Architect and Business 
Architect can be purchased as standalone products, as 
can several ER/Studio add-ons. The evaluation copy 
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ER/STUDIO 
ENTERPRISE 8.5 


Pros: Strong support for standards, 
domains, data lineage, and metadata tracking; 
excellent screen management tools 


Cons: High cost; long learning curve; having to 
register a remote SQL Server machine with the 
locally installed version of SSMS 


Rating: AY 


Price: $5,100, plus $1,260 for annual main- 
tenance (required) 


Recommendation: Although pricey, its 
awesome model management features make it 
highly suitable for data architects. 


Contact: Embarcadero Technologies e 
415-834-3131 е www.embarcadero.com 


shipped with a central repository and a user portal, 
which I didn't review. 


ERwin DM 7.3 

Installing and setting up ERwin DM was easy, 
although the download took a long time. ERwin 
employs the conventional data modeling layout. 

Features such as the data-type standards editor 
and the report template builder add much value to 
this package. For the busy data modeler, being able 
to view all the instances of a single type of artifact 
(e.g., all tables, all columns) in a single list and 
being able to check or edit their common properties 
is a terrific timesaver. Coupling ERwin DM with 
its Saphir option can help anyone struggling with 
extracting metadata from popular third-party ERP 
packages. 

When creating logical models, you can use the 
conventional logical data types (e.g., BLOB, string) 
or more detailed, platform-specific data types (e.g., 
tinyint, int). You would use logical data types if you 
planned to push your logical model to more than one 
type of physical platform, such as pushing it to SOL 
Server and Oracle. The SQL query tool is handy if 
you have to view the data in a connected database. 
Rather than using SSMS, you can view it directly in 
Erwin DM. The model comparison wizard is one of 
the cleanest and easiest to work with that Гуе found. 

ERwin DM doesn't support any version of SQL 
Server prior to SQL Server 2000. The workaround 
Is to script out the database into a DDL file, then 
reverse engineer from that file. I couldn't dispose of 
models without first saving them to disk, then using 
WinExplorer to delete the file, which seemed unnec- 
essarily convoluted. Other annoyances and problems 
include: 


Ө Note 


At press time, Embarca- 
dero Technologies released 
ER/Studio XE, the successor 
to ER/Studio Enterprise 
8.5. Although there wasn't 
enough time to review 
ER/Studio XE for this com- 
parison, it'll be reviewed 
in а future issue. 
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* The workspace is tabbed, but if you have mul- 
tiple models open at the same time, you can't 
simply click from tab to tab. You have to choose 
which model to display from the top menu, 
which is a little counter-intuitive, given the mul- 
tiple tab presentation in the workspace. 

* The product lacks a direct zoom capability. You 
can't click the zoom tool and apply it to a single 
artifact. Instead, the entire screen zooms in or 
out. This is a tremendous oversight, especially 
since I couldn't find a way to center the object 
that I wanted to see close-up. 

* The Domain Dictionary window can't be 
resized, requiring you to scroll back and forth for 
attribute/column properties, which is annoying. 

* Child windows, such as the Index Manager, 
aren't readily identified in the window banner, 
leading to confusion. 

* Model validation features aren't integrated into 
ERwin DM, which I found very disconcerting. 
To validate a model, you need the ERwin Data 
Model Validator package, which you must pur- 
chase separately. 

• When I created a new logical model, I had dif- 
ficulty finding the forward engineering option in 
the menus. (It's labeled derive model.) 

® When I pushed the logical model to a physical 
model, I couldn't find the Check Model/Validate 
SQL option or the Forward Engineer option 
mentioned in the online Help file. 

е I failed to figure out how to generate SQL code. 


CA offers the Erwin DM Community Edition, 
a freeware version of ERwin DM. CA also offers 
several other modeling products (e.g., ERwin Data 
Model Validator, ERwin Model Manager) that you 
can purchase. 


ERWIN DATA 
MODELER 7.3 


Pros: Artifacts easy to use and manage; 
strong standards editors; has SQL query tool 


Cons: Doesn't contain all the operations criti- 
cal to basic data modeling; non-standard option 
names; non-intuitive modeling behaviors 


Rating: KKO 
Price: $4,794 


Recommendation: The functionality necessary 
for data modeling is too fragmented among sepa- 
rate packages. If you were to purchase the other 
packages in addition to Erwin DM, you'd have a 
good data architect toolkit. 


Contact: CA • 631-342-6000 or 
800 225-5224 * erwin.com 


PowerDesigner Studio 
Enterprise 15.2 

The PowerDesigner installation DVD comes with a 
set of video tutorials that really help the new user 
get started. (Even existing customers who upgrade 
will find the tutorials helpful because there are so 
many new features in version 15.) Installation was a 
non-event. PowerDesigner uses the conventional data 
modeling layout. 

The newest version of PowerDesigner offers 
enterprise architecture modeling, with support for 
multiple enterprise architecture frameworks such as 
the Zachman Framework, The Open Group Architec- 
ture Framework (TOGAF), and Federal Enterprise 
Architecture Framework (FEAF). Plus, you can build 
your own custom framework. Enterprise architecture 
models let you map all organizational systems and 
processes, aligning stakeholders from all parts of the 
business (and not just technology) with strategies 
and goals. You can organize models by project, cre- 
ate shortcuts for documents, and relocate documents 
all from the PowerDesigner UI. You can link models 
within a project. 

In addition to all the conventional modeling 
notations, PowerDesigner strongly supports the 
object-oriented (OO) world. For example, the OO 
model supports class, object, use case, collaboration, 
sequence, statechart, activity, component, deploy- 
ment, and composite structure diagrams. 

PowerDesigner has strong large-model manage- 
ment capabilities as well. For example, you can divide 
model components (e.g., table artifacts, relationships) 
into packages, which are а form of submodel. Pack- 
ages can be nested. File and artifact management 
is extremely easy. You can use а form for creating 
and editing tables and their associated artifacts (e.g., 
columns, keys, indexes), or you can use the grid-like 
list layout, which is extremely customizable. There is 
a centralized repository, with check-in and check-out 
capabilities. The model's find operation automatically 
presents the target artifact in close-up mode, so it's 
easy to manage. 

Individual or organizational preferences, such as 
display preferences for the model, are easily modifi- 
able, with a large library of options. There's strong 
support for domain data typing. You can create data 
domains and use them across all model levels. There's 
equally strong support for modeling standards, which 
are set by default to industry best practices, but they're 
completely customizable. The impact analysis capa- 
bilities (both cross-model and lineage) are amazing. 
After you've modeled the enterprise, you can run an 
impact analysis to display downstream and upstream 
impacts of a change across the enterprise. 

The report wizard makes report generation quick 
and straightforward. The generated reports are hyper- 
linked—the metadata links to the relevant artifact and 
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усе versa. Reports аге saved with (һе model, after 

which you can edit them in the Report Editor. Report 

templates are customizable. 

In the requirement model, artifacts can be linked 
to other model artifacts for traceability. The optional 
repository, which is a central storage location for all 
models, is secure. A login is required, and individuals 
or groups can be assigned specific permissions. You 
can browse through the artifacts in the repository 
and edit them after extraction. Conflict resolution 
management upon check-in lets you reject, overwrite, 
or merge changes from multiple modelers. There is a 
web-based user portal for the repository and a search 
engine. 

Even this most elegant of data modeling software 
has faults: 

* Relationship line management can be awkward, 
even difficult. 

* [t doesn't have an option that lets you view all 
the artifacts for models, which are split into 
packages. 

* The trial period is only 15 days. Due to the 
complexity of the software, 15 days isn't enough 
time to get acquainted with the core capabilities, 
much less the extraordinary set of features that 
support enterprise architecture. 

* Until you've used this product long enough so 
that you recognize its power, it's hard to develop 
a viable argument for the purchase price. 


PowerDesigner is the only product of those 
reviewed that goes beyond data analysis and model- 
ing. It reaches out to the enterprise and lets you model 


POWERDESIGNER 
STUDIO ENTERPRISE 
15.2 


Pros: Supports 
the Zachman 
Framework; supports the 
entire Unified Modeling 
Language (UML) spectrum; integrates database 
and application schema and logic; maps require- 
ments to database artifacts; full standards modi- 
fication and enforcement capabilities 


Cons: High cost; long learning curve 
Rating: AY W Yr K 
Price: $4,794 


Recommendation: If you need an extremely 
high level of capability and functionality that will 
enable you to model the enterprise, this is the 
software for you. 


Contact: Sybase (an SAP company) ° 925-236- 
5000 or 800-792-2735 е www.sybase.com 
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the organization. In addition, you can tie documents 
(static and otherwise) and databases іпіо the organi- 
zational framework model. PowerDesigner is by far 
the most complete organizational modeling package 
on the market. Right out of the box, you can model 
business communications, city planning, organization 
charts, process maps, application architecture, service- 
oriented designs, and the technology infrastructure 
in your shop, with no need to purchase additional 
products. 


Editor’s Choice 
All the data modeling packages I reviewed aim to 
ease your workload and help you create a better 
work product deliverable. Some achieve these goals 
better than others. The product that will work 
best for you depends on your job duties and your 
budget. 
The reviewed software falls into two general 
categories: 
* A classic package that's meant for serious data 
or enterprise architects who are responsible for 
all seven steps of the database lifecycle (1.е., 
discover, design, develop, deploy, day-to-day, 
defend, and decommission). 
• A lighter-weight product that's targeted at 
developers and DBAs who might need to only 
inventory or refactor existing databases. 


PowerDesigner Studio Enterprise, DeZign for Data- 
bases, ER/Studio Enterprise, and ERwin Data Mod- 
eler are classic packages, whereas Toad Data Modeler 
for SQL Server and DatabaseSpy are lighter-weight 
products. 

For DBAs and developers, I recommend Toad 
Data Modeler, despite my struggles with file manage- 
ment. The package is intuitive, so it's easy to learn and 
use. It's also built to enforce modeling best practices, 
so it'll help grow your modeling skills. Besides, it's just 
plain fun to use. 

For data architects, I recommend one of two 
products. If your primary driver for choice is budget, 
go with DeZign for Databases. It has tons of capabili- 
ties with a high comfort and productivity factor, and 
is clearly a best buy. However, if your primary driver 
is standards enforcement, go with ER/Studio Enter- 
prise. This is the most powerful software package for 
data architects that I evaluated. 

For the enterprise architect, І recommend 
PowerDesigner Studio Enterprise. It's an elegant 
product with numerous capabilities, which is why 
it deserves the title of “Editor's Choice.” It'll prob- 
ably be a few years before you learn how to take full 
advantage of it, but during that time, it'll guide you 
into using industry-standard best practices and help 
you hone your enterprise architect skill set. E 
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2010 Best of 
Connections Awards 


enton Media's IT and Developer Group publications—including SQL Server 

Magazine, DevProConnections, SharePointPro Connections, апа Windows IT 
Pro—presented Best of Connections awards to the following exhibitors at the Fall 
2010 Connections conferences in Las Vegas, in recognition of their outstanding prod- 
ucts for ASP.NET/.NET/Visual Studio developers. 


SQL Server 
Finalists: SQL Sentry Performance Advisor for SQL Server, Idera SQL diagnostic manager, 
and Red Gate Software SQL Source Control 


Winner: SQL Sentry Performance Advisor for SQL Server, www.sqlsentry.com 

SQL Sentry Performance Advisor for SQL Server is the winner of this year's Best of 
Connections award in the SQL Server product category. Performance Advisor for SQL 
Server can monitor both SQL Server and Analysis Services and it displays all the impor- 
tant performance metrics in an easy-to-view dashboard. Performance Advisor for SQL 
Server shows the resources consumed by running processes as well as any blocking and 
server deadlocks. Email and pager notifications can be automatically forwarded when 
deadlocks are detected. Performance Advisor for SQL Server can perform both real- 
time performance monitoring as well as keeping a baseline history of past performance 
allowing you to gauge performance usage over time and detect any important trends. In 
addition to monitoring key performance metrics it also provides a graphical view of your 
server's disk topology enabling you to see your server's controller, disk and database file 
and log usage and capacity. An Integrated Quick Trace shows the active processes and 
it can be set up to provide automated monitoring based on thresholds set by the DBA. 
Performance Advisor also provides full reporting using SQL Server Reporting Services. 
Performance reports can be automatically emailed to subscribers or exported to PDF, 
Excel, or XML formats. SQL Sentry Performance Advisor for SQL Server helps you to 
get better performance out of your SQL Server systems and makes system management 
more efficient and effective. 


Visual Studio 
Finalists: Infragistics NetAdvantage Ultimate, ComponentOne Studio Enterprise, and 
GrapeCity ActiveReports 6 


Winner: Infragistics NetAdvantage Ultimate, www.infragistics.com 

NetAdvantage Ultimate, a UI control set for Windows Forms, WPF, Silverlight, and ASP 
.NET applications, stands out because of its emphasis on providing optimal user experience. 
A distinguishing feature of the product is its sophisticated data visualization controls, includ- 
ing mapping controls that are crisp, clear, and easy to interpret. 


ASP.NET Silverlight 
Finalists: Aptimize Website Accelerator, ScaleOut Software ScaleOut StateServer, and 
ComponentOne Studio for Silverlight 


Winner: Aptimize Website Accelerator, www.aptimize.com 

Aptimize Website Accelerator is a product meant for every web development team that has 
fielded user complaints about website slowness. This accelerator lives up to its name, speeding 
up both ASP.NET and SharePoint websites with no code changes at all. 


For the complete list of winners in all categories, see www.windowsitpro.com/awards/ 
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Savv Assistants 


Your guide to sponsored resources 


Practical Performance 
Monitoring & Tuning with 
Andrew Kelly 

Performance monitoring and tuning doesn't have 
to be rocket science! Attend this 3-day workshop 
and let SQL expert Andrew Kelly show you the 
techniques and tools you need to put an end to the 
bottlenecks in your own environment. 
sqlmag.com/go/performance 


A Quick-Start Guide to T-SQL 
Basics from Left-Brain.com 
T-SQL is the query language used by SQL Server 
and other database products, so knowing how to 
write T-SQL queries is an important skill for anyone 
who works with SQL Server databases. The 10 
lessons in the T-SQL 101 series can help you learn 
the basics! 

sqlmag.com/go/t-sql 


Subscribe to SharePointPro 
Connections FREE! 

Sign up now for the only magazine dedicated to 

all things SharePoint. Every issue includes field- 
tested information from experts in the SharePoint 
community. 

sqlmag.com/go/SharePoint 
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Performance 
monitoring and tuning 
doesn't have to be 
rocket science. 


Want to get more out of SQL Server? 
Expert Andrew Kelly can help. 

Join Andrew for a 3-day Essentials Workshop to 
learn the techniques and tools you need to 
identify and eliminate the bottlenecks in your 
environment. 


Practical Performance Monitoring & Tuning 
January 25-27, 2011 | Atlanta, GA So 
Learn more and register at sqImag.com/go/tuning 
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BUSINESS INTELLIGENCE 

ActiveAnalysis 2 Enhances ВІ Development 

GrapeCity PowerTools has announced ActiveAnalysis 2, a free-form OLAP, data visualization, and business 

intelligence component for Microsoft Silverlight, Windows Forms, and Microsoft ASP.NET. Major enhance- 

ments include letting developers embed cross-browser data visualization and BI features into their applications, 

and support for exporting on-screen data to a Microsoft Excel-compatible format so end users can share and 

archive data offline. Other features include: flexible data binding; drag and drop interface; charts, pivot tables, ments to products @ 

and data visualization in one control; and native support for SQL Server Analysis Services. To learn more, sqlmag.com. 

visit www.gcpowertools.com. — Brian Reinholz, 
editorial web architect 
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BUSINESS INTELLIGENCE 

MarkLogic Announces MarkLogic Server 4.2 

MarkLogic has announced MarkLogic Server 4.2. The new version includes new features that let enterprises 
store, manage, and analyze unstructured information. Included in this release is MarkLogic Information Stu- 
dio, an ETL tool for managing unstructured information. New capabilities include: replicate documents across 
databases, set the database state at a specified point in time, failover response for a failed server, and search 
tuning. To learn more, visit www.marklogic.com. 
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DATABASE MONITORING pel ° 
SQL Power Tools Updates Monitoring and Alerts Е 
SQL Power Tools has released Zero Impact 541 Monitor 2.1. Тһе : 
primary new features in this update are end-user experience moni- 
toring and automatic configuration of alert thresholds. With end- 
user experience monitoring, users can view response time delivered 
by the database server, with metrics such as average response time, 
server time, network time, rows returned, and more. The automatic 
configuration of alert thresholds offers a performance counter 
that can be automatically set up to check baselines. You can also 
set the counter to record trend increases. To learn more, visit www 
.Sqlpower.com. 


BUSINESS INTELLIGENCE 

Lyzasoft and Predixion Combine for Cloud-to- 
Cloud Analytics 

Predixion Software has integrated its predictive analytics solution into the Lyza Commons analytical collabo- 
ration solution. The result is that users can access Predixion’s self-service predictive analytics from within the 
Lyza Commons interface, removing the need to log into multiple applications to access analytics. Lyzasoft 
has integrated the following predictive analytics solutions from 
Predixion: Shopping Basket Analysis, Forecast, Detect Categories, 
and Analyze Key Influencers. To learn more, visit www.lyzasoft 
.com or www.predixionsoftware.com. 


DATABASE DEVELOPMENT 

EMS Releases Basix:Builder 3.0 

EMS Solutions has announced Basix:Builder 3.0, a rapid applica- 
tion development tool designed to build and deploy business appli- 
cations on SQL Server. The tool simplifies building robust business 
software with a basic menu structure, model design, and form 
layouts. Basix:Builder also offers document integration, workflow, 
charting, auditing, security, business logic, automated processing, 
and reporting, all without programming. To learn more, visit www 
.basixcentre.com. SOL} 
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Michael Оїеу 


(motey @ sqlmag.com) is senior technical 
director for Windows IT Pro and SQL 
Server wMagazine and author of Microsoft 
SQL Server 2008 New Features (Osborne/ 
McGraw-Hill). 


t its recent Professional Developers Conference 

(РОС) in Redmond, Microsoft reaffirmed its 
commitment to SQL Azure, the new cloud-based ver- 
sion of SQL Server, by announcing several important 
enhancements for SQL Azure. Microsoft is serious 
about addressing customers’ needs with the SQL 
Azure platform and bringing it more on par with the 
capabilities of an on-premises SQL Server installa- 
tion. Four of the most important recent announce- 
ments for SQL Azure follow: 


Database Backup 

Micrsoft announced before PDC that it was adding 
to SQL Azure Service Update 4 the ability to back up 
SQL Azure databases. I’m not really sure what Micro- 
soft was thinking when it released the earlier versions 
without the ability to perform backups—perhaps that 
SQL Azure's built-in availability abrogated the need 
to perform backups. 

However, it overlooked the need to provide protec- 
tion for end-user error. Backing up with bcp or SOL 
Server Integration Services (SSIS) wasn't a suitable 
replacement for database backup. 

With SQL Azure Service Update 4, you can use 
the new copy feature to make SQL Azure-based data- 
base backups. Being copies of the database, they do 
count toward the SQL Azure limit of 150 databases. 

SQL Azure database backup is available for SQL 
Azure now. Learn more about it at msdn.microsoft 
.com/en-us/library/ff951624.aspx. 


Database Manager for SQL 
Azure 

In the past, managing SQL Azure databases was 
more difficult than managing on-premises systems, 
mainly because of the lack of management tools. As 
part of SQL Server 2008 R2, SQL Server Manage- 
ment Studio was modified to be able to connect to 
SQL Azure. 

You can find the free SQL Azure compat- 
ible version of SQL Server 2008 R2 Management 
Studio Express at www.microsoft.com/downloads/ 
details.aspx?familyid=56ad557c-03e6-4369-9c1d- 
e81b33d8026b&displaylang-en. 


"38 4 SQL Azure Enhancements 


However, this still means using an on-premises tool 
to manage your database cloud. Database Manager for 
SQL Azure is a free web-based management tool that 
can be used to create schema and run queries against 
SQL Azure databases. Watch a video demo at blogs 
.msdn.com/b/sqlazure/archive/2010/10/29/10083115 
.aspx. 


SQL Azure Data Sync 

Tacitly acknowledging that SQL Azure will need to 
work in conjunction with one or more on-premises 
SQL Server systems, Microsoft announced the SQL 
Azure Data Sync feature. SQL Azure Data Sync 15 а 
cloud-based data synchronization service that's built 
using the Microsoft Sync Framework. 

It will be able to synchronize data between on- 
premises SQL Server systems and SQL Azure in the 
cloud. It can also replicate data to remote offices, 
and it will support scheduled synchronization and 
conflict handling for duplicate data. 

A second SQL Azure Data Sync CTP should 
appear by the end of 2010, and the service is 
expected in the first half of 2011. Learn more about 
SOL Azure Data Sync and download the first 
CTP at www.microsoft.com/en-us/sqlazure/datasync 
.aspx. 


SQL Azure Reporting 

Without a doubt, the most important new announce- 
ment at PDC was support for SOL Azure Reporting 
Services. Reporting Services is one of the most impor- 
tant features of an on-premises SQL Server installa- 
tion, and it was definitely needed to drive adoption 
of SQL Azure. 

With SQL Azure Reporting Services, reports can 
be created using BIDs, published to SQL Azure, and 
managed using the cloud-based Windows Azure 
Developer Portal. SQL Azure Reporting is expected 
to be available in a CTP by the end of 2010 and 
will be generally available in the first half of 2011. 
See the video demoing the new service at player 
.microsoftpdc.com/Session/5007e9c3-03cd-41b4- 
9elc-4eb1l7cd60e37. 500] 
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